Definition
A storage type where a dataset's commits are merged into dataset when read / viewed / queried.
This can be seen as "delayed ingestion": "compaction" happens delayed, on demand.
#todo improve to summarize semantics relative to commits lifecycle, before and after
Design details
At a high level, def~merge-on-read (MOR) writer goes through same stages as Copy-On-Write writer in ingesting data. The key difference here is that updates are appended to latest log (delta) file belonging to the latest file slice without merging. For inserts, Hudi supports 2 modes: embedded in parquer files. Hudi treats writing new records in the same way as inserting to Copy-On-Write files.
In the def~merge-on-read (MOR) storage model, there are 2 logical components :
As in the case of Copy-On-Write, the input tagged records are partitioned such that all upserts destined to a `file id` are grouped together. This upsert-batch is written as one or more log-blocks written to log-files. Hudi allows clients to control log file sizes (See [Storage Configs](../configurations))
The WriteClient API is same for both Copy-On-Write and def~merge-on-read (MOR) writers. With def~merge-on-read (MOR), several rounds of data-writes would have resulted in accumulation of one or more log-files. All these log-files along with base-parquet (if exists) constitute a `file slice` which represents one complete version of the file.