Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Definition

Controls how datasetdef~tables are exposed to queries

Excerpt

Hudi supports the following views of stored data

  • read pptimized view : Queries on this view see the latest snapshot of the dataset as of a given commit or compaction action. This view exposes only the base/columnar files in latest file slices to the queries and guarantees the same columnar query performance compared to a non-hudi columnar dataset.
  • incremental view : Queries on this view only see new data written to the dataset, since a given commit/compaction. This view effectively provides change streams to enable incremental data pipelines.
  • realtime view : Queries on this view see the latest snapshot of dataset as of a given delta commit action. This view provides near-real time datasets (few mins) by merging the base and delta files of the latest file slice on-the-fly.

Following table summarizes the trade-offs between the different views.

...

Given such flexible and comprehensive layout of data and rich def~timeline, Hudi is able to support three different ways of querying a def~table, depending on its def~table-type

Query Typedef~copy-on-write (COW)def~merge-on-read (MOR)
Snapshot QueryQuery is performed on the latest def~base-files across all def~file-slices in a given def~table or def~table-partition and will see records written upto the latest def~commit action.Query is performed by merging the latest def~base-file and its def~log-files across all def~file-slices in a given def~table or def~table-partition and will see records written upto the latest def~delta-commit action.
Incremental QueryQuery is performed on the latest def~base-file, within a given range of start , end  def~instant-times (called the incremental query window), while fetching only records that were written during this window by use of the def~hoodie-special-columnsQuery is performed on a latest def~file-slice within the incremental query window, using a combination of reading records out of base or log blocks, depending on the window itself.
Read Optimized QuerySame as snapshot queryOnly access the def~base-file, providing data as of the last def~compaction action performed on a given def~file-slice. In general, guarantees of how fresh/upto date the queried data is, depends on def~compaction-policy




Image Added


Related concepts

  1. def~read-optimized-query
  2. def~incremental-query
  3. def~snapshot-query
  4. def~timeline
  5. def~table
  6. def~commit
  7. def~table-

...

Related concepts

  1. timeline instant
  2. dataset
  3. commit
  4. storage type