Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

1) "We use RocksDB because we don't need fault tolerance."
2) "We don't use RocksDB because we don't want to manage an external database."
3) Believing RocksDB is reading and writing directly with S3 or HDFS (vs. local disk)
4) Believing FsStateBackend spills to disk or has anything to do with the local filesystem
5) Pointing RocksDB at network-attached storage, believing that the state backend needs to be fault-tolerant

This question from the ml mailing list is very representative of where users are struggling [1]. Many of these questions were not from new users but from organizations that were in production! The current state backend abstraction is to too complex for many of our users. What all these questions have in common is misunderstanding the relationship between how data is stored locally on TMs vs how checkpoints make that state durable.

...

Code Block
languagejava
/**
 * CheckpointStorage defines how checkpoint snapshots are persisted for fault tolerance.
 *. Various implementations  store their checkpoints in different fashions and have different requirements and
 * availability guarantees.
 *
 *<p>For example, JobManagerCheckpointStorage stores checkpoints in the memory of the JobManager.
 * It is lightweight and without additional dependencies but is not highly available
 * and only supports small state sizes. This checkpoint storage policy is convenient for
 * local testing and development.
 *
 *<p>FileSystemCheckpointStorage stores checkpoints in a filesystem. For systems like
 * HDFS, NFS Drives, S3, and GCS, this storage policy supports large state size,
 * in the magnitude of many terabytes while providing a highly available foundation
 * for stateful applications. This checkpoint storage policy is recommended for most
 * production deployments.
 */
public interface CheckpointStorage extends java.io.Serializable {

  CompletedCheckpointStorageLocation resolveCheckpoint(String externalPointer);

  CheckpointStorageAccess createCheckpointStorage(JobID jobId);
}

...

While two methods will be removed from StateBackend, externally defined state backends will be able to migrate by merely adding `implements CheckpointStorage` to their implementations. Again, this will be documented in the release notes.


 [1http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/State-Storage-Questions-td37919.html

...