Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Status

Current stateAccepted

Page properties


Discussion thread

...

JIRA: FLINK-13921

...

...

/9om3ttjhtwxqnfl2c3jpwqof443mr9s9
Vote thread
JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-13921

Release
1.10


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently, Flink's behaviour with respect to configuring the RestartStrategies is quite complicated and convoluted. The reason for this is that we evolved the way it has been configured and wanted to keep it backwards compatible. Due to this, we have currently the following behaviour:

  • If the config option `restart-strategy` is configured, then Flink uses this `RestartStrategy` (so far so simple)
  • If the config option `restart-strategy` is not configured, then 
    • If `restart-strategy.fixed-delay.attempts` or `restart-strategy.fixed-delay.delay` are defined, then instantiate `FixedDelayRestartStrategy(restart-strategy.fixed-delay.attempts, restart-strategy.fixed-delay.delay)`
    • If `restart-strategy.fixed-delay.attempts` and `restart-strategy.fixed-delay.delay` are not defined, then
      • If checkpointing is disabled, then choose `NoRestartStrategy`
      • If checkpointing is enabled, then choose `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")`

Proposed Changes

I would like to simplify the configuration by removing the "If `restart-strategy.fixed-delay.attempts` or `restart-strategy.fixed-delay.delay`, then" condition. That way, the logic would be the following:

...

That way we retain the user friendliness that jobs restart if the user enabled checkpointing and we make it clear that any `restart-strategy.fixed-delay.xyz` setting will only be respected if `restart-strategy` has been set to `fixed-delay`.

Compatibility, Deprecation, and Migration Plan

This simplification would, however, change Flink's behaviour and might break existing setups. Since we introduced `RestartStrategies` with Flink 1.0.0 and deprecated the prior configuration mechanism which enables restarting if either the `attempts` or the `delay` has been set, I think that the number of broken jobs should be minimal if not non-existent.

Test Plan

RestartStrategyFactory.createRestartStrategy needs to be tested since it is currently not covered by tests.


Rejected Alternatives

None