Definition
A storage def~table-type where a def~dataset def~table's def~commits are merged into def~dataset def~table when read / viewed / queried.
...
#todo improve to summarize semantics relative to def~commits lifecycle, before and after
Design details
Excerpt |
---|
In the 135860486 storage model, there are 2 logical components : this def~table-type, records written to the def~table, are quickly first written to def~log-files, which are at a later time merged with the def~base-file, using a def~compaction action on the timeline. Various def~query-types can be supported depending on whether the query reads the merged snapshot or the change stream in the logs or the un-merged base-file alone. At a high level, 135860486 writer goes through same stages as def~copy-on-write (COW) writer in ingesting data. The key difference here is that updates are appended to latest log (delta) file belonging to the latest file slice without merging. For inserts, Hudi supports 2 modes:
embedded in parquer files. Hudi treats writing new records in the same way as inserting to def~copy-on-write files. This table type is the most versatile, highly advanced and offers much flexibility for writing (ability specify different compaction policies, absorb bursty write traffic etc) and querying (e.g: tradeoff data freshness and query performance). At the same time, it can involve a learning curve for mastering it operationally. |
Kind of
- storage def~table-type