Page History

...

The basic logic is that during the full load phase, when creating a table, the system will use MongoDB's Client to read the first record from the specified collection. This record will be used to parse its schema and obtain the field names and attributes for table creation.
In the full load and incremental processing phase, the RichCdcMultiplexRecordEventParser#parseSchemaChange() method is reused.
For methods that directly produce data based on RichEventParser, a LinkedHashMap is maintained in memory. The key of the map represents the field name, and the value represents the field type. Each data record will undergo a "map.get" operation to check if the field exists in the map and if its type matches the current type. If the types do not match, the field and its new type will be added to the map, and the record of this change will be sent to the output stream for further processing through the side output stream for updateSchema operations.
This system supports two ways to synchronize MongoDB data into Paimon:
Scenario 1: When you explicitly know which fields to use. In this case, the system supports the retrieval of data from a single table and nested datasets.
Scenario 2: When the business requirements change. In this case, the system supports the synchronization of data from a single table and the entire database.
Image RemovedImage Added
Additional information: MongoDB CDC is implemented based on MongoDB's Change Streams, and the format of the real-time stream data obtained is as follows:

...

Page tree

Versions Compared

Old Version 6

New Version 7

Key