THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
To ease the discussion, we divide the problem of approximate task-local recovery into three parts with each part only focusing on addressing a set of problems, as shown in the graph below.
- Milestone One: sink failure. Here a sink task stands for no consumers reading data from it. In this scenario, if a sink vertex fails, the sink is restarted from the last successfully completed checkpoint and data loss is expected. If a non-sink vertex fails, a regional failover strategy takes place. In milestone one, we focus on issues related to task failure handling and upstream reconnection.
- Milestone Two: downstream failure -- a failed task leads all tasks rooted from the failed tasks to restart. In milestone two, we focus on issues related to missing checkpoint barriers.
- Milestone Three: single task failure -- a failed task leads itself to restart. In milestone three, we focus on issues related to downstream reconnection and different types of failure detection/handling (TM, JM, network, and e.t.c).
...