Definition
A storage model / table type where commitA def~table-type where a def~table's def~commits are merged when read (#todo improve)into def~table when read / viewed / queried.
This can be seen as "delayed ingestion": "compaction" happens delayed, on demand.
#todo improve to summarize semantics relative to def~commits lifecycle, before and after
Design details
Excerpt |
---|
In the Merge-On-Read storage model, there are 2 logical components - one for ingesting data (both inserts/updates) into the dataset and another for creating compacted views. The former is hereby referred to as `Writer` while the later At a high level, Merge-On-Read Writer 135860486 writer goes through same stages as Copydef~copy-On-Write writer on-write (COW) writer in ingesting data. The key difference here is that updates are appended to latest log (delta) file belonging to the latest file slice without merging. For inserts, Hudi supports 2 modes:1.
embedded in parquer files. Hudi treats writing new records in the same way as inserting to Copy-On-Write files. As in the case of Copydef~copy-On-Writeon-write (COW), the input tagged records are partitioned such that all upserts destined to a `file id` a def~file-id are grouped together. This upsert-batch is written as one or more log-blocks written to log def~log-files. Hudi allows clients to control log file sizes (See [Storage Configs](. ./configurations)) This table type is the most versatile, highly advanced and offers much flexibility for writing (ability specify different compaction policies, absorb bursty write traffic etc) and querying (e.g: tradeoff data freshness and query performance). At the same time, it can involve a learning curve for mastering it operationally. |
Kind of
...
Related concepts
...