Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Copy On Write - This table type enables clients to append/upsert data on columnar file formats, currently parquet. Any new data that is written to the Hudi dataset using COW table type, will write new parquet files. Updating an existing set of rows will result in a rewrite of the entire parquet files that collectively contain the affected rows being updated. Hence, all writes to such datasets are limited by parquet writing performance, the larger the parquet file, the higher is the time taken to ingest the data.
Merge On Read - This table type enables clients to append/upsert data on log based file formats, currently avro. Any new data that is written to the Hudi dataset using

...

MOR table type, will write new log/delta files that internally store the data as avro encoded bytes. A compaction process (configured as inline or asynchronous) will convert log file format to columnar file format (parquet). Two different InputFormats expose 2 different views of this data, HoodieInputFormat exposes columnar parquet reading performance while HoodieRealTimeInputFormat exposes columnar and/or log reading performance respectively. Updating an existing set of rows will result in either a) a companion log/delta file for an existing base parquet file generated from a previous compaction or b) an update written to a log/delta file in case no compaction ever happened for it. Hence, all writes to such datasets are limited by avro/log file writing performance, much faster than parquet. Although, there is a higher cost to pay to read log/delta files vs columnar (parquet) files.

More details can be found here.

...