...
For non-source tasks, if it receives the trigger of checkpoint n, then all its preceding tasks must be finished before checkpoint n and won’t send it the corresponding barriers. Therefore, we would first bookkeep the checkpoint n and after we received EndOfPartition from all the channels, we then report the snapshot for the checkpoint n. If the task received more checkpoint triggers after receiving EndOfPartition from all the channels, then it would report the snapshots directly.
For non-source tasks that do not receive the direct trigger of the checkpoint, it will receive barriers from some channels, but may receive EndOfPartition from some other channels. In this case, the channel received by EndOfPartition would be viewed as receiving the corresponding barrier. Specifically, for aligned checkpoints, the channels received EndOfPartition would be viewed as aligned. For unaligned checkpoints, the buffers before the EndOfPartition message will be snapshotted
...
draw.io Diagram border true diagramName Figure.2 simpleViewer false width links auto tbstyle top diagramDisplayName lbox true diagramWidth 568 revision 2
...
A summary of all the cases is listed in the following:
Aligned Checkpoint | Unaligned Checkpoint | |
Received Trigger + all EndOfPartition | Wait till processed all barriers, and then report the snapshot | Wait till processed all EndOfPartition, and then report the snapshot. In this case we would take the snapshot the same as the aligned method. |
Do not receive Trigger + some barriers | Wait till received barrier or EndOfPartition from each channel, then report the snapshot. | When receiving the first barrier, start spilling the buffers before the barrier or EndOfParitition for each channel. |
Tasks Waiting For Checkpoint Complete Before Finish
...