Page History

...

This is a rough roadmap (non exhaustive list) of what's to come in each of the areas for Hudi.

Improving indexing speed for time-ordered keys/small updates
- leverage parquet record indexes,
- serving bloom filters/ranges from timeline server/consolidate metadata
Support for indexing parquet records to improve speed
- Indexing the log file, moving closer to scalable 1-min ingests
Overhaul of
Improving indexing speed for uuid-keys/large update spreads
- global/hash based index to faster point-in-time lookup
Incrementalize & standardize all metadata operations e.g Incrementalizing cleaning based on timeline metadata
Auto tuning
- Auto tune bloom filter entries based on records
- Partitioning based on historical workload trend
- Determination of compression ratio

Incremental Pull natively via Spark Datasource
Real-time view support on Presto
Hardening incremental pull via Realtime view
Realtime view performance/memory footprint reduction.
Support for Streaming style batch programs via Beam/Structured Streaming integration

...

Standalone timeline server to server handle DFS listings, timeline requests
Consolidated filesystem metadata for query planning
- Hudi timeline is a log. if we compact it we get a snapshot of the table

Space shortcuts