Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add FLIP-151 vs FLIP-158

...

In the first version, the simplest solution seems preferable. 

Relationship to FLIP-158

FLIP-158 proposes a similar idea but in a more generic way.

However, this generality comes at cost, as with FLIP-151 Flink is able to:

1. "Squash" the changes made to the same key. For example, if some counter was changed 10 times then FLIP-151 will send only the last value (this allows to send AND store less data compared to FLIP-158).

2. Keep in memory only the changed keys and not the values (this allows to reduce memory AND latency (caused by serialization + copying on every update) compared to FLIP-158)


On the other hand, integration with FLIP-158 would add "background" materialization capability, eliminating periodic slow checkpoints or more complex way to achieve this. For such an integration

  1. notify ChangelogStateBackend that sending changes is not needed
  2. on checkpoint, ask HeapBackend to create an incremental checkpoint
  3. on materialization, ask for a full snapshot

This integration is out of scope of this FLIP.

Proposed Changes

  1. The following (~20) classes to be refactored to allow extension (extracting methods, adding hooks, etc.):
    1. (Heap-) KeyedStateBackend, SnapshotStrategy, RestoreOperation
    2. (CopyOnWrite-) Map, Entry, State, Snapshots
    3. RegisteredKeyValueStateBackendMetaInfo
  2. Incremental versions of the following (~10) classes to be added:
    1. HashMapStateBackend
    2. CoW StateMap, StateMapSnapshot, StateTable, StateTableSnapshot
    3. StateMapEntry
    4. KeyedStateBackend, RestoreOperation, SnapshotStrategy,
    5. RegisteredKeyValueStateBackendMetaInfo
    6. StateSnapshotRestore
  3. The following new classes to be added (~10):
    1. RemovalLog
    2. For each state type: Diff, DiffSerializer, Journal, JournalFactory
    3. StateSnapshotKeyGroupReaderV7
  4. The following classes to be updated:
    1. StateTableByKeyGroupReaders: add new version
    2. Fs- and MemoryStateBackend will have additional settings to construct incremental backend versions

...