Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The main work in Flink and Table Store are as followed

Table Store


ComponentMain Work


MetaService
  1. Manage the relationship between ETL job and table in Table Store , including source table, sink table.
  2. Manage the finished timestamp barrier of each table in Table Store
  3. Interaction between Flink and MetaService, such as register ETL job, get consistency version of table and ect.



Table Store

Catalog

  1. Register source and sink table with ETL job id.
  2. Create table based on a consistency version from MetaService

Source and SplitEnumerator
  1. SplitEnumerator managers the snapshot, split and timestamp barrier for specific table.
  2. Source read data and timestamp barrier from split, broadcast timestamp barrier
  3. Notify MetaService to update the completed timestamp barrier for tables.
  4. Notify MetaService to cleanup the information of the terminated ETL job.
Sink
  1. Write data to table store and commit data with timestamp barrier



Flink

Timestamp Barrier MechanismThe detailed and main work is in the above table
Planner
  1. Register job to MetaService to create relationship between source and sink tables.
  2. Create table based on a consistency version from MetaService 
JobManager
  1. Add a listener and call back it when the job ends

Constraint

The current FLIP design has two constraints and it may continue to improve in the future

  1. Multiple jobs are not supported to write to the same table concurrently
  2. ETL topology does not support cycles

MetaService needs to detect these situations and report errors when ETL jobs are registered.

The Next Step

This is an overall FLIP for data consistency in streaming and batch ETL. Next, we would like to create FLIP for each functional module with detailed design. For example:

  1. Timestamp Barrier Coordination and Generation
  2. Timestamp Barrier Checkpoint and Recovery
  3. Timestamp Barrier Replay Data Implementation
  4. Timestamp Barrier Alignment and Computation In Operator
  5. Introduce MetaService module and implement source/sink in Table Store and etc
  6. Job and Table management in MetaService such as exception handling, data revision and etc

Rejected Alternatives

...