...
In the first version, the simplest solution seems preferable.
Relationship to FLIP-158
FLIP-158 proposes a similar idea but in a more generic way.
However, this generality comes at cost, as with FLIP-151 Flink is able to:
1. "Squash" the changes made to the same key. For example, if some counter was changed 10 times then FLIP-151 will send only the last value (this allows to send AND store less data compared to FLIP-158).
2. Keep in memory only the changed keys and not the values (this allows to reduce memory AND latency (caused by serialization + copying on every update) compared to FLIP-158)
On the other hand, integration with FLIP-158 would add "background" materialization capability, eliminating periodic slow checkpoints or more complex way to achieve this. For such an integration
- notify ChangelogStateBackend that sending changes is not needed
- on checkpoint, ask HeapBackend to create an incremental checkpoint
- on materialization, ask for a full snapshot
This integration is out of scope of this FLIP.
Proposed Changes
- The following (~20) classes to be refactored to allow extension (extracting methods, adding hooks, etc.):
- (Heap-) KeyedStateBackend, SnapshotStrategy, RestoreOperation
- (CopyOnWrite-) Map, Entry, State, Snapshots
- RegisteredKeyValueStateBackendMetaInfo
- Incremental versions of the following (~10) classes to be added:
- HashMapStateBackend
- CoW StateMap, StateMapSnapshot, StateTable, StateTableSnapshot
- StateMapEntry
- KeyedStateBackend, RestoreOperation, SnapshotStrategy,
- RegisteredKeyValueStateBackendMetaInfo
- StateSnapshotRestore
- The following new classes to be added (~10):
- RemovalLog
- For each state type: Diff, DiffSerializer, Journal, JournalFactory
- StateSnapshotKeyGroupReaderV7
- The following classes to be updated:
- StateTableByKeyGroupReaders: add new version
- Fs- and MemoryStateBackend will have additional settings to construct incremental backend versions
...