Discussion threadhttps://lists.apache.org/thread/3oq2gfto9okmcgmq5wqf4zn35wko0jbf
Vote thread
JIRA

Unable to render Jira issues macro, execution error.

Release1.10

Motivation

According to a user survey about the default value of the restart delay, it turned out that the current default value of "0 s" is not optimal. In practice Flink users tend to set it to a non-zero value (e.g. "10 s") in order to prevent restart storms originating from overloaded external systems.

Proposed Changes

Set the default restart delay of the FixedDelayRestartStrategy ("restart-strategy.fixed-delay.delay") and of the FailureRateRestartStrategy ("restart-strategy.failure-rate.delay") to "1 s". "1 s" should prevent restart storms originating from causes outside of Flink (e.g. overloaded external systems) and still be fast enough to not having a noticeable effect on most Flink deployments.

Compatibility, Deprecation, and Migration Plan

Changing the default value of the restart delay will affect all Flink deployments which rely on the previous default value. We intend to add a release note to make people aware of this change when upgrading to the next Flink version. Moreover, the default restart value of "1 s" should not increase the restart time noticeably for most Flink jobs.

Test Plan

Should not need additional testing.

Rejected Alternatives

None