...
Comparison between Options
Criterion | Enumerate on Task | Enumerate on JobManager | Enumerate on SourceCoordinator |
---|---|---|---|
Encapsulation of Enumerator | Encapsulation in separate Task | Additional complexity in ExecutionGraph | New component SourceCoordinator |
Network Stack Changes | Significant changes. Some are more clear, like reconnecting. Some seem to break abstractions, like notifying tasks of downstream failures. | No Changes necessary | No Changes necessary |
Scheduler / Failover Region | Minor changes | No changes necessary | Minor changes |
Checkpoint alignment | No changes necessary (splits are data messages, naturally align with barriers) | Careful coordination between split assignment and checkpoint triggering. Might be simple if both actions are run in the single-threaded ExecutionGraph thread. | No changes necessary (splits are through RPC, naturally align with barriers) |
Watermarks | No changes necessary (splits are data messages, watermarks naturally flow) | Watermarks would go through ExecutionGraph | Watermarks would go through RPC |
Checkpoint State | No additional mechanism (only regular task state) | Need to add support for asynchronous non-metadata state on the JobManager / ExecutionGraph | Need to add support for asynchronous state on the SourceCoordinator |
Supporting graceful Enumerator recovery (avoid full restarts) | Network reconnects (like above), plus write-ahead of split | Tracking split assignment between checkpoints, plus | Tracking split assignment between checkpoints, plus write-ahead of split assignment between checkpoints |
Personal opinion from Stephan: If we find an elegant way to abstract the network stack changes, I would lean towards running the Enumerator in a Task, not on the JobManager.
...