Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When speculative execution is enabled, a SpeculativeScheduler(which extends AdaptiveBatchScheduler) will be used for task scheduling. SpeculativeScheduler will listen on slow tasks detected by SlowTaskDetector. It will create and deploy speculative executions for the slow tasks. Nodes that slow tasks located on will be treated as slow nodes and get blacklistedblocked, so that speculative executions will not be deployed on them. Once any execution finishes, the remaining homogeneous tasks will be canceled, so that only one execution will be admitted as finished and only its output will be visible to downstream consumer tasks or in external sink services.

...

  • SpeculativeScheduler needs to be able to directly deploy an Execution, while AdaptiveBatchScheduler can only perform ExecutionVertex level deployment.
  • SpeculativeScheduler does not restart the ExecutionVertex if an execution fails when any other current execution is still making progress
  • SpeculativeScheduler listens on slow tasks. Once there are slow tasks, it will blacklist the block the slow nodes and deploy speculative executions of the slow tasks on other nodes.
  • Once any execution finishes, SpeculativeScheduler will cancel all the remaining executions of the same execution vertex.

...

Once notified about slow tasks, the SpeculativeScheduler will handle them as below:

  1. Blacklist Block nodes that the slow tasks locate on. To achieve this, the scheduler will notify locations of slow tasks with a SlowTaskException add the slow nodes to the BlacklistHandlerblocklist. The BlacklistHandler will use a SlowTaskBlacklistStrategy (see section below) to make decisions.block action will be MARK_BLOCKED so that future tasks will not be deployed to the slow node, while deployed tasks can keep running. (See FLIP-224 Blocklist Mechanism for more details)
  2. Create speculative executions for slow tasks until the current executions of each execution vertex reach the concurrency limit (defined via config jobmanager.adaptive-batch-scheduler.speculative.max-concurrent-executions)
  3. Deploy the newly created speculative executions
SlowTaskBlacklistStrategy

A BlacklistStrategy will be used in the BlacklistHandler to decide task managers/nodes to blacklist and actions to perform. (See FLIP-224 Blacklist Mechanism for more details).

...

Limitations

Batch jobs only

...