Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt Include
def~write-operation
def~write-operation
nopaneltrue

Compaction

Excerpt Include
def~compaction
def~compaction
nopaneltrue
<WIP>

Cleaning

Excerpt Include
def~cleaning
def~cleaning
nopaneltrue
<WIP>

File Sizing

Hudi also performs several key storage management functions on the data stored in a Hudi dataset. A key aspect of storing data on DFS is managing file sizes and counts and reclaiming storage space. For e.g HDFS is infamous for its handling of small files, which exerts memory/RPC pressure on the Name Node and can potentially destabilize the entire cluster. In general, query engines provide much better performance on adequately sized columnar files, since they can effectively amortize cost of obtaining column statistics etc. Even on some cloud data stores, there is often cost to listing directories with large number of small files.

Here are some ways to efficiently manage the storage of your Hudi datasets.

...

Read Optimized Queries

<WIP>

Hive Integration

<wip>