Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. HIP-1 : CSV Source Support for Delta Streamer 
  2. HIP-2 : Orc Storage in Hudi
  3. HIP-3: Timeline Service with Incremental File System View Syncing
  4.  HIP-4 : Faster Hive incremental pull queries

Roadmap

<WIP>

...

This is a rough roadmap (non exhaustive list) of what's to come in each of the areas for Hudi, to provide a general idea for 

where we are headed.

Writing data & Indexing 

  • Support for indexing parquet records to improve speed
  • Indexing the log file, moving closer to scalable 1-min ingests
  • Overhaul of 
  • Incrementalizing cleaning based on timeline metadata

Reading data

  • Incremental Pull natively via Spark Datasource
  • Real-time view support on Presto
  • Hardening incremental pull via Realtime view
  • Support for Streaming style batch programs via Beam/Structured Streaming integration

Storage 

  • ORC Support
  • Support for collapsing and splitting file groups 
  • Custom strategies for data clustering
  • Columnar stats collection to power better query planning

Usability 

  • Painless migration of historical data, with safe experimentation
  • Hudi on Flink
  • Hudi for ML/Feature stores

Metadata Management

  • Standalone timeline server to server DFS listings

...