Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the sink failure case, only the sink vertex is expected to restart. The proposed changes are to extend RestartPipelinedRegionFailoverStrategy to restart a task if the task does not have any consumers and falls into the regular regional failover otherwise. Notice that RestartPipelinedRegionFailoverStrategy is untouched. The new strategy extends RestartPipelinedRegionFailoverStrategy, reuses most of the logic, and only overrides getTasksNeedingRestart.

2. Reconnects to Upstream

Currently, upstream tasks stay alive after downstream tasks fail. However, if the produced partition result type is PIPELINED, upstream tasks release produced partitions upon all of its subpartitions consumed (ReleaseOnConsumptionResultPartition). The failed (downstream) task cleans network resources when exiting, including InputChannel. Each channel maintains a ResultSubpartitionView, notifying the corresponding ResultSubpartition for consumption when releasing resources. 

This causes the restarted sink task not able to locate the input result partition since it has already been released.

The requirements for the ResultSubPartition are as follows:

  1. ResultSubPartition is kept when the consumer exits (task fails), the consumer only releases the view.
  2. ResultSubPartition is released through ResultPartitionManager on
    1. consumption finished; 
    2. producer failure/cancellation; 
    3. Job exits

Multiple views can be created on the same subpartition, but only one view exists at a time (No two views exist at the same time).


Compatibility, Deprecation, and Migration Plan

...