Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Stateless operator. The operator processes every input record and output the result which it just does before. It does not need to align data with Timestamp Barrier , and when it receives Timestamp Barrier , it should broadcast the barrier to downstream tasks. 
  • Stateful operator and temporal operator. Records in a same Timestamp Barrier  are out of order, stateful and temporal operators should align them according to their timestamp field. The operators will execute computation when they collect all the timestamp barrier, and broadcast it downstream tasks. There's a sequence relationship between timestamp barriers, and records between timestamp barriers are ordered. It means that the operators compute and output results for a timestamp barrier based on the result of a previous timestamp barrier.
  • Sink operator. Sink streaming output results to Table Store , and commit the results when it collect collects all the timestamp barrier. The source of downstream ETL job can prefetch data from Table Store , but should produce data after the upstream sink committed.

...

  1. JobManager reads and manages snapshot and timestamp barrier from Table Store , when it collect collects all the timestamp barrier of table, it sends the barrier to source subtasks.

  2. Source Node processes splits of snapshots. When it receives timestamp barrier from JobManager, it broadcasts timestamp barrier after finishing specified splits.

...


Timestamp Barrier

Watermark

Generation

JobManager must coordinate all source subtasks and generate a unified timestamp barrier from System Time or Event Time for them

Each source subtask generate timestamp barrier(watermark event) from System Time or Event Time

Checkpoint

Store <checkpoint, timestamp barrier> when the timestamp barrier is generated, so that the job can recover the same timestamp barrier for the uncompleted checkpoint.

None

Replay data

Store <timestamp barrier, offset> for source when it broadcast broadcasts timestamp barrier, so that the source can replay the same data according to the same timestamp barrier.

None

Align data

Align data for stateful operator(aggregation, join and etc.) and temporal operator(window)

Align data for temporal operator(window)

Computation

Operator compute for a specific timestamp barrier based on the results of a previous timestamp barrier.

Window operator only computes results in the window range.

Output

Operator output or commit results when it collect collects all the timestamp barrierbarriers, including operators with data buffer or async operations.

Window operator support "emit" output

...