Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
outlinetrue
stylecircle

General 


What is the difference between COW (copy on write) vs MOR (merge on read) storage types ?

Copy On Write - This table type enables clients to ingest data on columnar file formats, currently parquet. Any new data that is written to the Hudi dataset using COW table type, will write new parquet files. Updating an existing set of rows will result in a rewrite of the entire parquet files that collectively contain the affected rows being updated. Hence, all writes to such datasets are limited by parquet writing performance, the larger the parquet file, the higher is the time taken to ingest the data.

...

More details can be found here.

How do I choose a storage type for my workload ?

draw.io Diagram
bordertrue
viewerToolbartrue
fitWindowfalse
diagramNameTableTypeChoiceFlowDiagram
simpleViewerfalse
width
diagramWidth801
revision3

...

Find more details on trade offs between cow & mor storage types here.

How do I use hudi to avoid creating tons of small files?

HoodieWriteConfig exposes knobs to allow for such flexibility. 

...

HoodieDeltaStreamer users

HoodieWriteClient users


Performance 


Deployment