Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Older checkpoints are removed by JM once they aren’t referenced anymore (existing functionality).

Alternatively, compaction can be done incrementally. For that, each snapshot should have (for each map):
- vStart: minimum version of the map needed to restore using this snapshot
- vTaken: version of the map at which it was taken
Upon finalization, all snapshots that are not needed for restore (i.e. oldSnapshot.vTaken < newSnapshot.vStart for each map) can be dropped. This tracking can be done in SharedStateRegistry by adding some (ordered) mapping between StateMaps and their versions, per key group.

vStart is calculated as a minimum version across all map entries (each entry maintains a version when it was last fully snapshotted to support journaling).

In the first version, the simplest solution seems preferable. 

Proposed Changes

  1. The following (~20) classes to be refactored to allow extension (extracting methods, adding hooks, etc.):
    1. (Heap-) KeyedStateBackend, SnapshotStrategy, RestoreOperation
    2. (CopyOnWrite-) Map, Entry, State, Snapshots
    3. RegisteredKeyValueStateBackendMetaInfo
  2. Incremental versions of the following (~10) classes to be added:
    1. CoW StateMap, StateMapSnapshot, StateTable, StateTableSnapshot
    2. StateMapEntry
    3. KeyedStateBackend, RestoreOperation, SnapshotStrategy,
    4. RegisteredKeyValueStateBackendMetaInfo
    5. StateSnapshotRestore
  3. The following new classes to be added (~10):
    1. RemovalLog
    2. For each state type: Diff, DiffSerializer, Journal, JournalFactory
    3. StateSnapshotKeyGroupReaderV7
  4. The following classes to be updated:
    1. StateTableByKeyGroupReaders: add new version
    2. Fs- and MemoryStateBackend will have additional settings to construct incremental backend versions

...