With this FLIP, I propose to allow to unify checkpoints and savepoints by allowing savepoints to be triggered automatically.

Proposed Changes

Operation	Manual Savepoint Flink 1.1	Automatic with Checkpoints FLIP-10	Periodic Savepoints FLIP-10
Trigger Savepoint	Manual	Automatic	Automatic
Dispose Savepoint	Manual	Automatic	Manual

Automatic with Checkpoints

Persistent Checkpoints

The checkpoint coordinator has a fixed-size FIFO queue of completed checkpoints that are retained (current default size is 1). Checkpoints are discarded when they are removed from this queue. I propose to store these checkpoints as savepoints. This means that if a job fails permanently the user will have a savepoint available to restore from. When a newer checkpoint completes, old savepoints will be automatically discarded like regular checkpoints.

As an example think of the following scenario: a job runs smoothly until it hits a bad record that it cannot handle. The current behaviour will be that the job will try to recover, but it will hit the bad record again and keep on failing. With the proposed change, some recent checkpoint is stored as a savepoint and the user can update the program to handle bad records and restore from the savepoint.

...

Page tree

Versions Compared

Old Version 8

New Version 9

Key

Proposed Changes

Automatic with Checkpoints

Persistent Checkpoints

Page tree

Page History

Versions Compared

Old Version 8

New Version 9

Key

Proposed Changes

Automatic with Checkpoints

Persistent Checkpoints