Page History

Info

title	Please Read!!

Most of Hudi content is now hosted on the project site or the Github repo. This wiki is not updated/maintained actively.

This wiki space hosts

Table of Contents

If you are looking for documentation on using Apache Hudi (Incubating), please visit the project site or engage with our community

...

How-to blogs

...

Design documents/

...

RFCs

RFCs are the way to propose large changes to Hudi and the RFC Process details how to go about driving one from proposal to completion.

List below

Anyone can initiate a RFC. Please note that if you are unsure of whether a feature already exists or if there is a plan already to implement a similar one, always start a discussion thread on the dev mailing list before initiating a RFC. This will help everyone get the right context and optimize everyone’s usage of time.

Community Management

Apache Hudi (incubating) - Release Guide
Apache Hudi Community Weekly Sync
Committer On-boarding Guide

Roadmap

Below is a depiction of what's to come and how its sequenced

...

This is a rough roadmap (non exhaustive list) of what's to come in each of the areas for Hudi.

Writing data & Indexing

Improving indexing speed for time-ordered keys/small updates
- leverage parquet record indexes,
- serving bloom filters/ranges from timeline server/consolidate metadata
- Indexing the log file, moving closer to scalable 1-min ingests
Improving indexing speed for uuid-keys/large update spreads
- global/hash based index to faster point-in-time lookup
Incrementalize & standardize all metadata operations e.g cleaning based on timeline metadata
Auto tuning
- Auto tune bloom filter entries based on records
- Partitioning based on historical workload trend
- Determination of compression ratio

Reading data

Incremental Pull natively via Spark Datasource
Real-time view support on Presto
Hardening incremental pull via Realtime view
Realtime view performance/memory footprint reduction.
Support for Streaming style batch programs via Beam/Structured Streaming integration

Storage

ORC Support
Support for collapsing and splitting file groups
Custom strategies for data clustering
Columnar stats collection to power better query planning

Usability

Painless migration of historical data, with safe experimentation
Hudi on Flink
Hudi for ML/Feature stores

Metadata Management

...

Community Support

Space shortcuts

Page tree

Versions Compared

Old Version 36

New Version Current

Key