Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When a task fail, we could calculate its index(executionIndex) in executionList by executionAttemptID. Then the scheduler takes a series of processing for the corresponding execution according to the executionIndex as shown below.

In order to better failover logic, I will extend the calss FailureHandlingResult with an additional member-variable.

Code Block
languagejava
titleFailureHandlingResult class extension
public class FailureHandlingResult {
	@Nullable private final Integer executionIndex;
}

Black list of node

Most long tail task are caused by cluster problems, so I must ensure speculative execution runs on different node from origin execution.

...