You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This page is meant as a template for writing a FLIP. To create a FLIP choose Tools->Copy on this page and modify with your content and replace the heading with the next FLIP number and a description of your issue. Replace anything in italics with your own description.

Document the state by adding a label to the FLIP page with one of "discussion", "accepted", "released", "rejected".

Discussion threadhere (<- link to https://lists.apache.org/list.html?dev@flink.apache.org)
Vote threadhere (<- link to https://lists.apache.org/list.html?dev@flink.apache.org)
JIRAhere (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)
Release<Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Unaligned checkpoints have been in our codebase for a very long time already, proving to be stable, reliable and are solving a lot of potential problems. Especially with the option to timeout aligned checkpoints to unaligned, there seems to be no reasons to keep using the aligned checkpoints by default. This would make adoption of Flink easier, especially for the new users, who are most likely currently first deploying Flink the default aligned checkpoints, encountering problems during back-pressure, searching online for a solution, and only then enabling unaligned checkpoints.

Public Interfaces

None of the public interfaces will be changed. Only the default values of the org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions#ENABLE_UNALIGNED  and org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions#ALIGNED_CHECKPOINT_TIMEOUT

Proposed Changes

I'm proposing to:

  • enable unaligned checkpoints by default
  • change the aligned checkpoint timeout from 0ms to 5s 

Those settings should make the change completely transparent for most of the users. Especially jobs that are working either without back pressure or with just small back pressure would be unaffected. Only jobs with some noticeable back pressure would switch to using unaligned checkpoints.

This would help for most of the jobs that are experiencing some back pressure. There are some edge cases, like very large parallelism jobs, with relatively small state, where user doesn't care about the checkpointing to completely timely, while the back pressure is not large enough to cause checkpoint timeouts. For those users change to the unaligned checkpoints will significantly increase state size, without any benefits. However I expect such cases to be far more rare compared to the jobs that would benefit from enabling unaligned checkpoints.

Compatibility, Deprecation, and Migration Plan

Those settings should make the change completely transparent for most of the users. Especially jobs that are working either without back pressure or with just small back pressure would be unaffected. Only jobs with some noticeable back pressure would switch to using unaligned checkpoints.

This would help for most of the jobs that are experiencing some back pressure. There are some edge cases, like very large parallelism jobs, with relatively small state, where user doesn't care about the checkpointing to completely timely, while the back pressure is not large enough to cause checkpoint timeouts. For those users change to the unaligned checkpoints will significantly increase state size, without any benefits. However I expect such cases to be far more rare compared to the jobs that would benefit from enabling unaligned checkpoints.

Test Plan

None.

Rejected Alternatives

None.

  • No labels