Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

From the concepts page, A “file slice” refers to one complete snapshot of a logical hudi file. It contains one base file and one or more delta files. Conceptually, we encapsulate the bootstrap index information at file-slice (data-file) level. So a file-slice would be able to provide information about the external location where the original columns are residing.

In Hudi, we have an implementation abstraction call file system view which maps physical files to file-slices. This abstraction will also annotate file-slices with bootstrap index entries (skeleton-file to external-file) so that higher layers can handle external files in a consistent way.

With this model, if we need to update a batch of records “1-K” in old partitions for the newly bootstrapped table “fact_events_hudi”, the following steps are performed,

...