Authors: Kostas Kloudas, Aljoscha Krettek, Konstantin Knauf, Yu Li
Status
...
Page properties | |
---|---|
|
...
...
...
|
...
JIRA:
...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
The following matrix shows the current state of things with the different formats and features of checkpoints and savepoints.
Y | Yes, the feature is supported |
N | No, the feature is not supported |
M | Supported but not in all cases |
user-controlled | incremental | self-contained | side-effects | recovery | rescaling | unified format | |
savepoints | Y | N | Y | Y | Y | Y | Y |
checkpoints | M | Y | Y | Y | Y | M | N |
Illustrative Shortcomings
...
The main change that this FLIP proposes is:
All state snapshots (checkpoints and savepoints) are simply SNAPSHOTS from Flink’s and they are eligible for RECOVERY. |
Snapshots
Given the above:
...
- The complexity of explaining of savepoint/checkpoints and corner cases disappears, as we can have a matrix with all available formats and their strong points and limitations
- The user can mix-and-match snapshot formats based on his/her needs. For example he/she can stop a job with an incremental user-induced snapshot as described in FLIP-45 and use that to upgrade the cluster. This will allow to stop the job quickly (incremental snapshot) even for jobs with large state, where taking a full snapshot could be prohibitively slow.
- Evolution of savepoints/checkpoints is bound to be common for both, as most of the discussion will be around evolution of the formats, which are available to both.
Rejected Alternatives
Savepoints with no Side-Effects
...