You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1

Definition

A log of def~instant-actions that are performed on a def~table, ordered by def~instant-time.

Design details


At its core, Hudi maintains a timeline of all def~instant-action performed on the def~table at different instants of time that helps provide instantaneous views of the def~table, while also efficiently supporting retrieval of data in the order in which it was written. The timeline is akin to a redo/transaction log, found in databases, and consists of a set of def~timeline-instants. Hudi guarantees that the actions performed on the timeline are atomic & timeline consistent based on the instant time. Timeline is implemented as a set of files under the `.hoodie` def~metadata-folder directly under the def~table-basepath. Specifically, while the most recent instants are maintained as individual files, the older instants are archived to the def~timeline-archival folder, to bound the number of files, listed by writers and queries. 

Design decisions


Key Instant action types performed include:

  • COMMITS - `action type` which denotes an atomic write of a batch of records into a def~table (see def~commit).
  • CLEANS - `action type` which denotes a background activity that gets rid of older versions of files in the def~table, that are no longer needed.
  • DELTA_COMMIT - `action type` which denotes an atomic write of a batch of records into a def~merge-on-read (MOR) def~table-type of def~table, where some/all of the data could be just written to delta logs (see def~commit).
  • COMPACTION - `action type` which denotes a background activity to reconcile differential data structures within Hudi e.g: merging updates from delta log files onto def~base-files columnar file formats. Internally, compaction manifests as a special def~commit on the timeline (see def~timeline)
  • ROLLBACK - `action type` denotes that a def~timeline of `instant action type` commit/delta commit was unsuccessful & rolled back, removing any partial files produced during such a write
  • SAVEPOINT - `action type` marks certain file groups as “saved”, such that cleaner will not delete them. It helps restore the def~table to a point on the timeline, in case of disaster/data recovery scenarios.

Any given instant can be in one of the following states:

  • REQUESTED - Denotes an action has been scheduled, but has not initiated
  • INFLIGHT - Denotes that the action is currently being performed
  • COMPLETED - Denotes completion of an action on the timeline

Design decisions

  1. #todo

Related concepts

  1. def~table
  2. instant state
  3. def~instant-action
  4. def~instant-time
  5. def~commit
  6. file format

Status (draft)


  • No labels