You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Hudi provides efficient upserts, by mapping a def~record-key + def~partition-path combination consistently to a def~file-id, via an indexing mechanism. This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file group. In short, the mapped file group contains all versions of a group of records.Hudi provides efficient upserts, by mapping a def~record-key + def~partition-path combination consistently to a def~file-id, via an indexing mechanism. This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file group. In short, the mapped file group contains all versions of a group of records. Hudi currently provides two choices for indexes : def~bloom-index and def~hbase-index, (with a few in the works : Unable to render Jira issues macro, execution error. , Unable to render Jira issues macro, execution error. ) to map a record key into the file id to which it belongs to. This enables us to speed up upserts significantly, without scanning over every record in the table. Hudi Indices can be classified based on their ability to lookup records across partition.


A `global` index does not need partition information for finding the file-id for a record key but a `non-global` does.



  • No labels