Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Page properties


Discussion thread
Vote thread
JIRA


Release1.4

Status

Current state: "Completed"

Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-19-Improved-BLOB-storage-architecture-td18092.html

JIRA: -

...


Motivation

The current architecture around the BLOB server and cache components seems rather patched up and has some issues regarding concurrency ([FLINK-6380]), cleanup, API inconsistencies / currently unused API ([FLINK-6329], [FLINK-6008]). These make future integration with FLIP-6 or extensions like offloading oversized RPC messages ([FLINK-6046]) difficult. We therefore propose an improvement on the current architecture as described below which tackles these issues, provides some cleanup, and enables further BLOB server use cases.

...

Gliffy Diagram
nameblob-store-architecture
pagePin8

BlobServer

  • offers file upload and download facilities based on jobId and BlobKey
  • local store (file system): read/write access, using "<path>/<jobId>/<BlobKey>"
  • HA store: read/write access for high availability, using "<path>/<jobId>/<BlobKey>"
  • responsible for cleanup of local and HA storage
  • upload to local store, then to HA (possibly in parallel, but waiting for both to finish before acknowledging)
  • downloads will be served from local storage only
  • on recovery (HA): download needed files from HA to local store, take cleanup responsibility for all other files on the path, i.e. orphaned files, too! (see below)

...

During recovery, the JobManager (or the Dispatcher for FLIP-6) will:

  • fetch all jobs to recover
  • download their BLOBs lazily and increase reference counts appropriately (at the JobManager only after successful job submission)
  • put any other, i.e. orphaned, file in the configured storage path into staged cleanup

...