Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info
titlePlease Read!!

Most of Hudi content is now hosted on the project site or the Github repo.  This wiki is not updated/maintained actively.

This wiki space hosts 

...



If you are looking for documentation on using Apache Hudi, please visit the project site or engage with our community

Technical documentation

How-to blogs

...

...

Design documents/RFCs

RFCs are the way to propose large changes to Hudi and the RFC Process details how to go about driving one from proposal to completion.  Anyone can initiate a RFC. Please note that if you are unsure of whether a feature already exists or if there is a plan already to implement a similar one, always start a discussion thread on the dev mailing list before initiating a RFC. This will help everyone get the right context and optimize everyone’s usage of time.Below is a list of RFCs 

Children Display
pageRFC Process

Community Management

Roadmap

Below is a depiction of what's to come and how its sequenced

...

This is a rough roadmap (non exhaustive list) of what's to come in each of the areas for Hudi.

Writing data & Indexing 

  • Improving indexing speed for time-ordered keys/small updates
    • leverage parquet record indexes,
    • serving bloom filters/ranges from timeline server/consolidate metadata
    • Indexing the log file, moving closer to scalable 1-min ingests
  • Improving indexing speed for uuid-keys/large update spreads
    • global/hash based index to faster point-in-time lookup
  • Incrementalize & standardize all metadata operations e.g cleaning based on timeline metadata
  • Auto tuning 
    • Auto tune bloom filter entries based on records
    • Partitioning based on historical workload trend
    • Determination of compression ratio

Reading data

...

...

Storage 

  • ORC Support
  • Support for collapsing and splitting file groups 
  • Custom strategies for data clustering
  • Columnar stats collection to power better query planning
  • Object storage

Usability 

  • Painless migration of historical data, with safe experimentation
  • Hudi on Flink
  • Hudi for ML/Feature stores

Metadata Management

...