Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

...

Page properties


Discussion thread

...

...

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-28131

...

Release1.16


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

  • jobmanager.adaptive-batch-scheduler.speculative.max-concurrent-executions, default to "2". It controls how many executions (including the original one and speculative ones) of an ExecutionVertex can execute at the same time.
  • jobmanager.adaptive-batch-scheduler.speculative.block-slow-node-duration, default to "5 1 min". It controls how long an identified slow node should be blocked for.

...

Currently, AdaptiveBatchScheduler does not support jobs with PIPELINED data exchanges. As a result, speculative execution does not support PIPELINED data exchanges either. Requiring all data exchanges to be BLOCKING also simplifies things, because each ExecutionVertex is an individual pipelined region in this case and can have individual speculations. Otherwise multiple ExecutionVertex from one pipelined region need to do speculative execution together.

This also means that 

Speculative execution of sources and sinks are disabled by default

...

The web UI does not show all the concurrent executions of each ExecutionVertex/subtask. It only shows the one with the fastest progress.

User defined functions must not be affected by its speculative instances

When a user defined function and its speculative instances run concurrently, they must not affect each other. For example,

  • access to the same exclusive resources
  • overriding the output to external services which happens as a side effect, i.e. not via Flink sinks
  • competition for data ingestion. Note that it includes cases that
    • user defined source function competition
    • data ingestion happens as a side effect, i.e. not via Flink sources.
  • ...

Once the concurrent instances can affect each other, it may result in task failures, or even worse, data inconsistency. So that speculative executions should not be enabled in this case.

Compatibility, Deprecation, and Migration Plan

...