Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To ease the discussion, we divide the problem of approximate task-local recovery into three parts with each part only focusing on addressing a set of problems, as shown in the graph below.

View file
namegraph.pdf
height250
Image Removed



  • Milestone One: sink failure. Here a sink task stands for no consumers reading data from it. In this scenario, if a sink vertex fails, the sink is restarted from the last successfully completed checkpoint and data loss is expected. If a non-sink vertex fails, a regional failover strategy takes place. In milestone one, we focus on issues related to task failure handling and upstream reconnection.
  • Milestone Two: downstream failure -- a failed task leads all tasks rooted from the failed tasks to restart. In milestone two, we focus on issues related to missing checkpoint barriers.
  • Milestone Three: single task failure -- a failed task leads itself to restart. In milestone three, we focus on issues related to downstream reconnection and different types of failure detection/handling (TM, JM, network, and e.t.c).

...

2). Network failures between TM and JM that can potentially cause orphan tasks[ (details discussed in Section 2, Point1).

...

  1. ResultSubPartition is kept when the consumer exits (task fails), the consumer only releases the view.
  2. ResultSubPartition is released through ResultPartitionManager on
    1. consumption finished; 
    2. producer failure/cancellation; 
    3. Job exits
  3. Multiple views can be created on the same subpartition, but only one view exists at a time (No two views exist at the same time).

There are four more points worth to highlight:

Point1: Orphan/Zombie Tasks

It is possible that TM disconnects to JM, but both TM and JM are alive. In the sink failure case, TM fails the sink node, and JM redeploys sink to a different TM slot requested from RM.

However, it is possible that JM deploys a new sink before TM successfully fails the old one. We have to make sure only one view exists at any time, and the view is connected with the new attemptId/attemptNumber. In theory, the old execution has the possibility to connect after the new execution. This could happen, but rarely. If only one view is allowed at a time, creating ResultSubpartitionView for the newer execution attempt should prevent the creation of ResultSubpartitionView from the older execution attempt to succeed.

Compatibility, Deprecation, and Migration Plan

...