...
Savepoints in native format should have the same properties as canonical savepoints when it comes to being self-contained and relocatable. All duplicated and newly created files should be located in the user specified target directory and all paths should be encoded as relative paths with respect to the metadata file.
Checkpoint vs savepoint guarantees
As part of this FLIP we would like to finally explicitly document what user can do with checkpoints and savepoints. Currently there are a number of things that are not officially supported, but we know users are doing that. So let's start with the status quo:
Pre-existing
Yes | Yes (unofficially)/Maybe (untested) | No (but could be done) | Very difficult/impossible |
Canonical Savepoint | Aligned Checkpoint | Unaligned Checkpoint | |
Statebackend change | |||
State Processor API | (???) | ||
Schema evolution | untested? | untested? | |
Flink major version upgrade | (???) | ||
Flink minor version upgrade | (???) | (???) | |
Job full upgrade | |||
Job upgrade w/o changing graph shape and record types | |||
Rescaling |
Questions:
What about RocksDB upgrades? If we bump RocksDB version between Flink versions, do we support recovering from a native format snapshot (incremental checkpoint)?
Proposal 1
Canonical Savepoint | Native Savepoint | Native Savepoint | Unaligned Checkpoint | |
Statebackend change | ||||
State Processor API | (???) | (???) | ||
Schema evolution | (???) | (???) | (???) | |
Flink major version upgrade | (???) | |||
Flink minor version upgrade | (change) | (change) | ||
Job full upgrade | ||||
Job upgrade w/o changing graph shape and record types | ||||
Rescaling |
Main aim of the first proposal is to unify guarantees between two types of savepoints and two types of checkpoints. The only difference between native and canonical savepoint should be the ability to change statebackend, and officially there would be no difference between aligned and unaligned checkpoints. Hence we would simplify the documentation, as we could avoid documenting the distinction between unaligned and aligned checkpoints.
Proposal 2
Canonical Savepoint | Native Savepoint | Native Savepoint | Unaligned Checkpoint | |
Statebackend change | ||||
State Processor API | (???) | (???) | ||
Schema evolution | (???) | (???) | (???) | |
Flink major version upgrade | (change) | |||
Flink minor version upgrade | (change) | (change) | ||
Job full upgrade | (change) | |||
Job upgrade w/o changing graph shape and record types | (change) | (change) | ||
Rescaling |
The main aim of this proposal is to actually document what we can easily provide, based on the fact that native savepoint and aligned checkpoints would be virtually the same thing. The disadvantage of this proposal is that we would need to document the distinction between aligned and unaligned checkpoints.
Code changes
Passing the selected savepoint type from the CLI, through the CheckpointCoordinator down to state backends on the `StreamTask` doesn’t seem to be an issue.
...