Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Page properties


Status

Current state: Under Discussion

Discussion thread: -

...

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-25276

...

Release1.15

...


Table of Contents

This is a follow up on https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership which this is heavily relying on and as such please check FLIP-193 before for the require context

...

It will be possible to request each savepoint independently (via CLI) to be either in the canonical format (the current behaviour) or the native format. When native format is selected, the default configuration of the checkpoints will be used to determine whether the savepoint should be incremental or not. the `state.backend.incremental` setting will decide the type of native format snapshot and will take effect for both checkpoints and savepoints (with native type).

Semantic

As defined in FLIP-193, incremental savepoints won’t be allowed to refer to any pre-existing files used in previous checkpoints and Flink won’t be allowed to rely on the existence of any newly created files as part of that incremental savepoint. This is because savepoints are owned by the user, while checkpoints are owned by Flink.

...

As part of this FLIP we would like to finally explicitly document what user can do with checkpoints and savepoints. Currently there are a number of things that are not officially supported, but we know users are doing that. So let's start with the status quo:

Pre-existing

Yes

Yes (unofficially)/Maybe (untested)

No (but could be done)

Very difficult/impossible


Canonical Savepoint

Aligned Checkpoint

Unaligned Checkpoint

Statebackend change




Self-contained and relocatable


State Processor API(reading)

(???



State Processor API(writing)


Schema evolution


untested?

untested?

Flink

major

minor (1.x → 1.y) version upgrade


(???)


Flink

minor

bug/patch (1.14.x → 1.14.y) version upgrade


(???)

(???)

Job full upgrade

Arbitrary job upgrade (changed graph shape/record types)




Job upgrade w/o changing graph shape and record types




Rescaling




Questions:

What about RocksDB upgrades? If we bump RocksDB version between Flink versions, do we support recovering from a native format snapshot (incremental checkpoint)?

Proposal

...


Canonical Savepoint

Native Savepoint

Aligned Checkpoint

Unaligned Checkpoint

Statebackend change





Self-contained and relocatable



State Processor API

(

???

reading)


(???)
Schema evolution

(???)

(???)

(???)

Flink major version upgrade

Flink minor version upgrade

(change)

(change)

Job full upgrade

Job upgrade w/o changing graph shape and record types

Rescaling

Main aim of the first proposal is to unify guarantees between two types of savepoints and two types of checkpoints. The only difference between native and canonical savepoint should be the ability to change statebackend, and officially there would be no difference between aligned and unaligned checkpoints. Hence we would simplify the documentation, as we could avoid documenting the distinction between unaligned and aligned checkpoints.

Proposal 2

Canonical Savepoint

Native Savepoint

Aligned Checkpoint

Unaligned Checkpoint

Statebackend change


State Processor API
(???)(???
(writing)



Schema evolution



(???)

(???)

(???)

Flink major

Flink minor (1.x → 1.y) version upgrade



(change)


Flink

minor

bug/patch (1.14.x → 1.14.y) version upgrade



(change)

(change)

Job full upgrade

Arbitrary job upgrade (changed graph shape/record types)



(change)

Job upgrade w/o changing graph shape and record types



(change)(change)

Rescaling





The main aim of this proposal is to actually document what we can easily provide, based on the fact that native savepoint and aligned checkpoints would be virtually the same thing. The disadvantage of this proposal is that we would need to document the distinction between aligned and unaligned checkpoints.

...

Support of the incremental savepoint is subject to configured statebackend providing such feature. For example, without FLIP-151, HashMapStatebackend would not be able to provide incremental savepoints.the default configuration

This FLIP-203 does not intend to define guarantees for usage of State Processor API and Schema evolution with native format checkpoints or savepoints. This is intended as a follow up work in the future.

Compatibility, Deprecation, and Migration Plan

...

  1. In Flink 1.15 `--native` savepoint mode is added, but `--canonical` is kept the default.
  2. In Flink 1.16 `--native` will become the new default.

Rejected Alternatives

Rejected proposal for checkpoint guarantees 


Canonical Savepoint

Native Savepoint

Aligned Checkpoint

Unaligned Checkpoint

Statebackend change





Self-contained and relocatable



State Processor API


(???)

(???)


Schema evolution


(???)

(???)

(???)

Flink minor (1.x → 1.y) version upgrade





Flink bug/patch (1.14.x → 1.14.y) version upgrade



(change)

(change)

Arbitrary job upgrade (changed graph shape/record types)





Job upgrade w/o changing graph shape and record types





Rescaling





Main aim of the first proposal was to unify guarantees between two types of savepoints and two types of checkpoints. The only difference between native and canonical savepoint should be the ability to change statebackend, and officially there would be no difference between aligned and unaligned checkpoints. Hence we would simplify the documentation, as we could avoid documenting the distinction between unaligned and aligned checkpoints.

It was rejected because native savepoints and checkpoints are basically the same thing, so there is not much sense in artificially decreasing aligned checkpoint guarantees.