Definition
A storage model / table type where commitA def~table-type where a def~table's def~commits are merged when read (#todo improve)into def~table when read / viewed / queried.
This can be seen as "delayed ingestion": "compaction" happens delayed, on demand.
#todo improve to summarize semantics relative to def~commits lifecycle, before and after
Design details
Excerpt |
---|
In the Merge-On-Read storage model, there are 2 logical components - one for ingesting data (both inserts/updates) into the dataset and another for creating compacted views. The former is hereby referred to as `Writer` while the later is referred as `Compactor`this def~table-type, records written to the def~table, are quickly first written to def~log-files, which are at a later time merged with the def~base-file, using a def~compaction action on the timeline. Various def~query-types can be supported depending on whether the query reads the merged snapshot or the change stream in the logs or the un-merged base-file alone. At a high level, Merge-On-Read Writer 135860486 writer goes through same stages as Copy On Write def~copy-on-write (COW) writer in ingesting data. The key difference here is that updates are appended to latest log (delta) file belonging to the latest file slice without merging. For inserts, Hudi supports 2 modes:1.
eg bloom index This table type is the most versatile, highly advanced and offers much flexibility for writing (ability specify different compaction policies, absorb bursty write traffic etc) and querying (e.g: tradeoff data freshness and query performance). At the same time, it can involve a learning curve for mastering it operationally. |
Kind of
...
Related concepts
...