Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Page properties

Document the state by adding a label to the FLIP page with one of "discussion", "accepted", "released", "rejected".

Discussion threadhere (<- link to https://lists.apache.org/list.html?dev@flink.apache.org)/thread/tws8p5lrzjwd71kcdjzznly97ywdjp8l
Vote threadhere (<- link to https://lists.apache.org/list.html?dev@flink.apache.org)
JIRAhere (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)
Release<Flink Version>


...

Unaligned checkpoints have been in our codebase for a very long time already, proving to be stable, reliable and are solving a lot of potential problems. Especially with the option to timeout aligned checkpoints to unaligned, there seems to be no very few reasons to keep using the aligned checkpoints by default. This Enabling unaligned checkpoints by default would make adoption of Flink easier, especially for the new users, who are most likely currently . Instead first deploying Flink with the current default aligned checkpointsconfiguration, encountering problems during back - pressure, searching online for a solution, and only then enabling unaligned checkpoints, new users wouldn't have to do anything.

Public Interfaces

None of the public interfaces will be changed. Only the default values of the org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions#ENABLE_UNALIGNED  and org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions#ALIGNED_CHECKPOINT_TIMEOUT

...

  • enable unaligned checkpoints by default
  • change the aligned checkpoint timeout from 0ms to 5s 

Compatibility, Deprecation, and Migration Plan

Those settings should make the change completely transparent for most of the users. Especially jobs that are working either without back pressure or with just small back pressure would be unaffected. Only jobs with some noticeable back pressure would switch to using unaligned checkpoints.

This would help for most of the jobs that are experiencing some back pressure. There are some edge cases, like very large parallelism jobs, with relatively small state, where user doesn't care about the checkpointing to completely timely, while the back pressure is not large enough to cause checkpoint timeouts. For those users In such a scenario the change to the unaligned checkpoints will significantly increase state size, without any benefits. However I expect such cases to be far more rare compared less common to the jobs that would benefit from enabling unaligned checkpoints.

Compatibility, Deprecation, and Migration Plan

Those settings should make the change completely transparent for most of the users. Especially jobs that are working either without back pressure or with just small back pressure would be unaffected. Only jobs with some noticeable back pressure would switch to using unaligned checkpoints.

This would help for most of the jobs that are experiencing some back pressure. There are some edge cases, like very large parallelism jobs, with relatively small state, where user doesn't care about the checkpointing to completely timely, while the back pressure is not large enough to cause checkpoint timeouts. For those users change to the unaligned checkpoints will significantly increase state size, without any benefits. However I expect such cases to be far more rare compared to the jobs that would benefit from enabling unaligned checkpointsAnother thing to consider is that we currently do not support job upgrades and Flink minor version upgrades with unaligned checkpoints, so users would have to be guided to using savepoints in those cases.

This change would have to be clearly visible in the release notes.

...