Info | ||
---|---|---|
| ||
Most of Hudi content is now hosted on the project site or the Github repo. This wiki is not updated/maintained actively. |
This wiki space hosts
Table of Contents |
---|
If you are looking for documentation on using Apache Hudi, please visit the project site or engage with our community
...
How-to blogs
- How to manually register Hudi tables into Hive via Beeline?
- Ingesting Database changes via Sqoop/Hudi
- De-Duping Kafka Events With Hudi DeltaStreamer
Design documents/RFCs
RFCs are the way to propose large changes to Hudi and the RFC Process details how to go about driving one from proposal to completion. Anyone can initiate a RFC. Please note that if you are unsure of whether a feature already exists or if there is a plan already to implement a similar one, always start a discussion thread on the dev mailing list before initiating a RFC. This will help everyone get the right context and optimize everyone’s usage of time.
Below is a list of RFCs
Children Display page RFC Process
Community Management
- Apache Hudi - Release Guide (Pre Graduation)
- Apache Hudi Community Bi-Weekly Sync
- Committer On-boarding Guide
- Community Support
Roadmap
Under construction
, early 2021 unveiling
Writing data & Indexing
- Improving indexing speed for time-ordered keys/small updates
- leverage parquet record indexes,
- serving bloom filters/ranges from timeline server/consolidate metadata
- Indexing the log file, moving closer to scalable 1-min ingests
- Improving indexing speed for uuid-keys/large update spreads
- global/hash based index to faster point-in-time lookup
- Incrementalize & standardize all metadata operations e.g cleaning based on timeline metadata
- Auto tuning
- Auto tune bloom filter entries based on records
- Partitioning based on historical workload trend
- Determination of compression ratio
Reading data
...
...
Storage
- ORC Support
- Support for collapsing and splitting file groups
- Custom strategies for data clustering
- Columnar stats collection to power better query planning
- Object storage
Usability
- Painless migration of historical data, with safe experimentation
- Hudi on Flink
- Hudi for ML/Feature stores
Metadata Management
...