You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Status

Current stateAccepted

Discussion threadhttps://lists.apache.org/thread.html/9602b342602a0181fcb618581f3b12e692ed2fad98c59fd6c1caeabd@%3Cdev.flink.apache.org%3E

JIRAFLINK-13884

Released:

Motivation

According to a user survey about the default value of the restart delay, it turned out that the current default value of "0 s" is not optimal. In practice Flink users tend to set it to a non-zero value (e.g. "10 s") in order to prevent restart storms originating from overloaded external systems.

Proposed Changes

Set the default restart delay of the FixedDelayRestartStrategy ("restart-strategy.fixed-delay.delay") and of the FailureRateRestartStrategy ("restart-strategy.failure-rate.delay") to "1 s". "1 s" should prevent restart storms originating from causes outside of Flink (e.g. overloaded external systems) and still be fast enough to not having a noticeable effect on most Flink deployments.

Compatibility, Deprecation, and Migration Plan

Changing the default value of the restart delay will affect all Flink deployments which rely on the previous default value. We intend to add a release note to make people aware of this change when upgrading to the next Flink version. Moreover, the default restart value of "1 s" should not increase the restart time noticeably for most Flink jobs.

Test Plan

Should not need additional testing.

Rejected Alternatives

None

  • No labels