Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Option 1: Reuse Tasks' FINISH status

The first option is to reuse the tasks' FINISH status managed bySchedulerby the Scheduler. This option avoid avoids additional overhead to maintain tasks' status. Since the tasks' status is modified in JobMaster's main thread, to avoid data inconsistent inconsistencies we would need to proxy the task to compute tasks to trigger also to the main thread. 

This option might has have one problem. During the time we compute the tasks to trigger and the trigger message reached reaching the TaskExecutor, the tasks might finish during this period, which make the trigger fail. If we do not deal with this situation, the checkpoint would fail after timeout since not all tasks report their snapshots. To deal with this problem, we would need to make CheckpointCoordinator listen to the FINISH status report of the tasks, and if one task finished before reporting snapshot, CheckpointCoordinator would re-compute the tasks to trigger and re-trigger the new sources after the tasks finished. Although this patch could solve this issue, if we met with the cascade cases unfortunately that the triggers keep fail due to tasks get finished, the JM might incurs high overhead. 

...