Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt Include
def~table-type
def~table-type
nopaneltrue

Copy On Write Table

def~copy-on-write (COW)

Excerpt Include
def~copy-on-write (COW)
def~copy-on-write (COW)
nopaneltrue




Merge On Read Table

def~merge-on-read (MOR)

Excerpt Include
def~merge-on-read (MOR)
def~merge-on-read (MOR)
nopaneltrue



Writing

Write Operations

...

  • The small file handling feature in Hudi, profiles incoming workload and distributes inserts to existing def~file-group instead of creating new file groups, which can lead to small files.
  • Employing a cache of the def~timeline, in the writer such that as long as the spark cluster is not spun up everytime, subsequent def~write-operations never list DFS directly to obtain list of def~file-slices in a given def~table-partition
  • User can also tune the size of the def~base-file as a fraction of def~log-files & expected compression ratio, such that sufficient number of inserts are grouped into the same file group, resulting in well sized base files ultimately.
  • Intelligently tuning the bulk insert parallelism, can again in nicely sized initial file groups. It is in fact critical to get this right, since the file groups once created cannot be deleted, but simply expanded as explained before.

Querying

Excerpt Include
def~query-type
def~query-type
nopaneltrue
<WIP>

Snapshot Queries

Excerpt Include
def~snapshot-query
def~snapshot-query
nopaneltrue
<WIP>

Incremental Queries

Excerpt Include
def~incremental-query
def~incremental-query
nopaneltrue
<WIP>

Read Optimized Queries

Excerpt Include
def~read-optimized-query
def~read-optimized-query
nopaneltrue
<WIP>