Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently, if a task fails, it transits to the state ExecutionState.FAILED and notifyFinalState to TaskExecutor. TaskExecutor updateTaskExecutionState to JM through JobMasterGateway. Then the JM switches job state from JobStatus.RUNNING to JobStatus.RESTARTING . Then the and the Scheduler handles failure through ExecutionFailureHandler#getFailureHandlingResult. FailoverStrategy decides the set of task vertices to restart. The default failover strategy is RestartPipelinedRegionFailoverStrategy. For a streaming job, the region failover strategy restarts the entire job if its jobGraph does not contain disconnected subgraphs.

In the sink failure case, only the sink vertex is expected to restart. The proposed changes are to extend RestartPipelinedRegionFailoverStrategy to restart a task if the task does not have any consumers and falls into the regular regional failover otherwise. Notice that RestartPipelinedRegionFailoverStrategy is untouched. The new strategy extends RestartPipelinedRegionFailoverStrategy, reuses most of the logic, and only overrides getTasksNeedingRestart.

...