Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

...

Page properties

Document the state by adding a label to the FLIP page with one of "discussion", "accepted", "released", "rejected".

Discussion thread

...

...

...

JIRA: None create yet

...

thread/5wylbxworyjfqnd9vtt3p3o3ntws121n
JIRA

Jira
serverASF JIRA
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-25402

Release
1.15


Motivation

Currently, Flink processes (JobManager and TaskManager) can only persist information via Flink's HA services. The stored information is usually cluster-scoped, meaning that it is currently not easily possible to store process specific information. Moreover, for storing large blobs Flink currently needs to go to a remote object store from which it needs to download these blobs when restarting. If the Flink processes were restarted on the same node or on a node where the same volume was mounted to, then we could simply use the local disk/mounted volume to store process specific information. For example, we could use this volume to store local state. This would allow us to use local state when recovering from a process failure. Flink could also store the cached blobs on this volume which would avoid having to download them again from the BlobServer.

...

Additionally, we probably have to add logic that can detect aborted/corrupted blob uploads and removes them. This has to work for the BlobServer as well as for the BlobStore. So far this was never a problem because every process restart would use a fresh blob storage directory for the BlobServer and overwrite any blobs stored in the BlobStore.

Luckily we can use the already existing MessageDigest for checking for corrupted files. However, we might have to add logic that checks this value before accessing blobs in the BlobServer and BlobStore and invalidates corrupted entries.

...