THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
When a task fail, we could calculate its index(executionIndex) in executionList by executionAttemptID. Then the scheduler takes a series of processing for the corresponding execution according to the executionIndex as shown below.
In order to better failover logic, I will extend the calss FailureHandlingResult with an additional member-variable.
Code Block | ||||
---|---|---|---|---|
| ||||
public class FailureHandlingResult {
@Nullable private final Integer executionIndex;
} |
Black list of node
Most long tail task are caused by cluster problems, so I must ensure speculative execution runs on different node from origin execution.
...