Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Let us say the commit time associated with this upsert operation is “C1”. It is given that C1 is greater than the “BOOTSTRAP_COMMIT” (001000000000).
  • Assuming Bloom Index, index lookup happens directly on Hudi skeleton files.  Let’s say the hudi skeleton file with file id “h1” has all the records, 
  • In the coming description, “regular” hudi file means it is a hudi parquet file with per-record hudi metadata columns, original columns and bloom index in the single file. For Copy-On-Write table, the  writing phase identifies that the latest file-slice for the file Id “h1” is generated by bootstrap using special bootstrap commit time. It reads the original external file stored under original root location “/user/hive/warehouse/fact_events”. Hudi Merge Handle reads both this external file and the metadata-only hudi file parallelly, stitching the records together and merging them with incoming batch of records to create a “regular” hudi file with brand new version for the fileId “h1”.

...