Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. ResultSubPartition is kept when the consumer exits (task fails), the consumer only releases the view.
  2. ResultSubPartition is released through ResultPartitionManager on
    1. consumption finished; 
    2. producer failure/cancellation; 
    3. Job exits
  3. Multiple views can be created on the same subpartition, but only one view exists at a time (No two views exist at the same time).

...

However, JM may deploy a new sink before TM successfully fails the old one. We have to make sure only one view exists at any time, and the view is connected with the new attemptId/attemptNumber. In theory, the old execution can connect after the new execution. In reality, it can happen, but rarely. If only one view is allowed at a time, creating ResultSubpartitionView for the newer execution attempt should prevent the creation of ResultSubpartitionView from the older execution attempt to succeed.

Point2: Network connection errors between TM

There are cases where sink tasks are not able to notify upstream tasks to release views due to netty connection errors between TMs (the notification channel is broken). The restarted sink will find the upstream subpartition having an old view that has not been released yet upon successful reconnection. Remember that only one view is allowed at a time, so the old view will be released before a new one can be created.

Point3: Remote or Local Input Channel

Reconnection to the upstream task is not very different whether the downstream and the upstream is connected through LocalInputChannel or RemoteInputChannel. The information TaskDeploymentDescriptor needed for the downstream task to reconnect to upstream is generated when the failed task is redeployed. TaskDeploymentDescriptor#InputGateDeploymentDescriptor describes how to read data from the upstream task.

Point4: Credits

Upstream credits are reset to the initial state as the downstream lost all input data after restarting.

The proposed changes are either:

  1. Change PipelinedSubPartition to fulfill the requirement, or
  2. Build a separate type of ResultSubPartition

Compatibility, Deprecation, and Migration Plan

...