Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here is some background information necessary to understand the design. Readers are expected to be familiar with basic Hudi concepts described here

Per Record Hudi Metadata

Image RemovedImage Added


The above figure shows the layout of records in hudi. Each record has 5 Hudi metadata fields :

...

Onboard Hudi for new partitions alone:


Image RemovedImage Added

Apache Hudi partitions can coexist with other non-hudi partitions. Apache Hudi query engine integration is carefully implemented to handle queries that span across these partitions. This would let users use Hudi for managing new partitions while keeping older partitions untouched. In the example above, historical partitions from Jan 1 2010 to Nov 30 2019 are in non-hudi format while newer partitions starting from Dec 01 2019  support Apache hudi capabilities. As the historical partitions are not managed by Apache HUDI, none of the primitives provided by Apache HUDI work on the data in those partitions. For append only type of datasets (like a table built from reading mobile/time-series data from kafka), this would work perfect.

...