Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Apache Flink's durability story is a mystery to many users. One of the most common recurring questions from users comes from not understanding the relationship between state, state backends, and snapshots. Some of this confusion can be abated with learning material. Still, the question is so pervasive that we believe Flink's user APIs should be better communicate with what different components are responsible.

Some frequent misconceptions we havve heard over the past several years include:

1) "We use RocksDB because we don't need fault tolerance."
2) "We don't use RocksDB because we don't want to manage an external database."
3) Believing RocksDB is reading and writing directly with S3 or HDFS (vs. local disk)
4) Believing FsStateBackend spills to disk or has anything to do with the local filesystem
5) Pointing RocksDB at network-attached storage, believing that the state backend needs to be fault-tolerant

This question from the ml is very representative of where users are struggling[1]. Many of these questions were not from new users but from organizations that were in production! The current state backend abstraction is to complex for many of our users. What all these questions have in common is misunderstanding the relationship between how data is stored locally on TMs vs how checkpoints make that state durable.

The FLIP aims to address two core issues; inconsistent naming and intermixing of unrelated configurations/responsibilitiesactively help users by allowing them to reason about state backends separately from checkpoint durability. In the future, a state backend only defines where and how state is stored locally on the TM while checkpoint storage defines where and how checkpoints are stored for recovery.

Naming

Flink ships with three state backends out of the box: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend. MemoryStateBackend and FsStateBackend are named based on where they write out their checkpoints. However, they both use the same in-memory data structure to store state locally. RocksDBStateBackend, on the other hand, is named based on where it stores data locally while it also snapshots to a durable filesystem. There is an additional common question about how to manage RocksDB. Users tend to misunderstand that RocksDB is embedded and does not need to be managed explicitly.

...

While two methods will be removed from StateBackend, externally defined state backends will be able to migrate by merely adding `implements CheckpointStorage` to their implementations. Again, this will be documented in the release notes.


 1 http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/State-Storage-Questions-td37919.html