...
- Stateless operator. The operator processes every input record and output the result which it just does before. It does not need to align data with
Timestamp Barrier
, and when it receivesTimestamp Barrier
, it should broadcast the barrier to downstream tasks. - Stateful operator and temporal operator. Records in a same
Timestamp Barrier
are out of order, stateful and temporal operators should align them according to their timestamp field. The operators will execute computation when they collect all the timestamp barrier, and broadcast it downstream tasks. There's a sequence relationship between timestamp barriers, and records between timestamp barriers are ordered. It means that the operators compute and output results for a timestamp barrier based on the result of a previous timestamp barrier. - Sink operator. Sink streaming output results to
Table Store
, and commit the results when it collect collects all the timestamp barrier. The source of downstream ETL job can prefetch data fromTable Store
, but should produce data after the upstream sink committed.
...
JobManager
reads and manages snapshot and timestamp barrier fromTable Store
, when it collect collects all the timestamp barrier of table, it sends the barrier to source subtasks.Source Node processes splits of snapshots. When it receives timestamp barrier from
JobManager
, it broadcasts timestamp barrier after finishing specified splits.
...
Timestamp Barrier | Watermark | |
Generation |
| Each source subtask generate timestamp barrier(watermark event) from System Time or Event Time |
Checkpoint | Store <checkpoint, timestamp barrier> when the timestamp barrier is generated, so that the job can recover the same timestamp barrier for the uncompleted checkpoint. | None |
Replay data | Store <timestamp barrier, offset> for source when it broadcast broadcasts timestamp barrier, so that the source can replay the same data according to the same timestamp barrier. | None |
Align data | Align data for stateful operator(aggregation, join and etc.) and temporal operator(window) | Align data for temporal operator(window) |
Computation | Operator compute for a specific timestamp barrier based on the results of a previous timestamp barrier. | Window operator only computes results in the window range. |
Output | Operator output or commit results when it collect collects all the timestamp barrierbarriers, including operators with data buffer or async operations. | Window operator support "emit" output |
...