Definition
An item in the `Hudi` ingestion processing timeline
Design details
At its core, Hudi maintains a timeline
of all def~instant-action performed on the def~table at different instants
of time that helps provide instantaneous views of the def~table, while also efficiently supporting retrieval of data in the order of arrival. A Hudi `timeline instant` consists of the following components
Design decisions
- Hudi guarantees that the actions performed on the timeline are atomic & timeline consistent based on the instant time.
Key Instant action types performed include:
Key Instant action types performed include:COMMITS
- `action type` which denotes an atomic write of a batch of records into a def~table (see def~commit).CLEANS
- `action type` which denotes a background activity that gets rid of older versions of files in the def~table, that are no longer needed.DELTA_COMMIT
- `action type` which denotes an atomic write of a batch of records into a def~merge-on-read (MOR) def~table-type of def~table, where some/all of the data could be just written to delta logs (see def~commit).COMPACTION
- `action type` which denotes a background activity to reconcile differential data structures within Hudi e.g: merging updates from delta log files onto def~base-files columnar file formats. Internally, compaction manifests as a special def~commit on the timeline (see def~timeline)ROLLBACK
- `action type` denotes that a def~timeline of `instant action type` commit/delta commit was unsuccessful & rolled back, removing any partial files produced during such a writeSAVEPOINT
- `action type` marks certain file groups as “saved”, such that cleaner will not delete them. It helps restore the def~table to a point on the timeline, in case of disaster/data recovery scenarios.
Any given instant can be in one of the following instant states:
Any given instant can be in one of the following states:
REQUESTED
- Denotes an action has been scheduled, but has not initiatedINFLIGHT
- Denotes that the action is currently being performedCOMPLETED
- Denotes completion of an action on the timeline
Design decisions
- #todo