Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Partition pruning 
  • File pruning
    • Some data file formats contain metadata including range information for certain columns (for parquet, this metadata is stored in footer).
    • As part of query planning, all range information from data files is read.
    • Irrelevant data files are then pruned based on predicates and available range information

Partition pruning typically puts the burden on users to select partitions where the data may exist. File pruning approach pruning approach is expensive and does not scale if there are large number of partitions and data files to be scanned. So we propose a new solution to store additional information as part of Hudi metadata table to implement data skipping index. The goals of data skipping index is to provide:

...