Quick Links 

Issue Management

Actual issue tracking is in Apache JIRA!  We use this page to ground ourselves.

  1. Any issue you file, please file as "Issue" and not as "Sub task" (sub tasks cannot be added to Epics) 
  2. Please attach issues to an Epic as much as possible, so it does not scatter around. (see 1.0 Epics
  3. Keep issues unassigned, unless you are about to begin working on it. 
  4. Issue must be tagged with Fix Version/s: 1.0.0 to show up on the board.
  5. If you have a PR up, please ensure the JIRA is in "Review" state and mark the "Reviewers" field with who your review is blocked on. 
  6. Vinoth Chandar  will move issues from 1.0.0 to 1.1.0 if it does not seem important.

Roadmap to visualize which epics are in what phase. 

Sync Meeting Format

Daily 7pm PST, ping Vinoth Chandar to be added 

  • Report status, planned next steps, call out any blockers/discussion items (1 min each max)
  • Update this execution planner, see if we need to change course, adjust plans
  • DIscuss blockers, Live jams to resolve issues within bounds of meeting. 

Execution Phase 1 (Aug 15-Oct 31)

Focus: Spark, Flink (for NB Concurrency Control)
(green star) - In progress/on track (red star) - blocked (star) - In progress/slipping (blue star) - Not started


  • (Vinoth) Identify & land all critical outstanding PRs (that solve critical issues, take us forward in our 1.0 path)
  • (Sagar & Vinoth & Danny) Land storage format 1.0 
    • (Vinoth) (star) Scope this epic tight.  https://issues.apache.org/jira/browse/HUDI-6242 
    • (Sagar) (green star) Make all the agreed upon format changes described here. Unable to render Jira issues macro, execution error.
    • (Ethan) (green star) Standardization of serialization - log blocks, timeline meta files. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.   Unable to render Jira issues macro, execution error.
    • (Sagar) (star) Base file format can be different within file groups  Unable to render Jira issues macro, execution error.  
    • (Sagar) (green star) No Java classes show up in table properties.   Unable to render Jira issues macro, execution error.
    • (Danny) (green star) Introduce transition time into the active timeline Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.
    • (Danny) (green star) Remove log block append for multiple commits Unable to render Jira issues macro, execution error.   
    • (Danny)(green star) Introduces new completion time based file slicing Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.
  • Design:
    • (Sagar) (green star) Multi-table transactions? ( Unable to render Jira issues macro, execution error. )
    • (Lin) Keys: UUIDs vs. what we do today. Unable to render Jira issues macro, execution error.
    • (Vinoth) (star)  Put up a 1.0 tech specs doc Unable to render Jira issues macro, execution error.   
    • (Vinoth) (star) OCC/Time-Travel Read (+Write) Unable to render Jira issues macro, execution error.  
    • (Vinoth/Danny) (star) Time-Travel read on NB CC & finalize NB CC design
    • (Danny) TrueTime API implementation for Hudi (wait based, or filesystem/stateless based)
    • (Vinoth/Shawn) Cloud native storage layout design (Udit's RFC-60)
    • (Sagar/Vinoth) (green star) Logical partitioning/Index Functions API (Java, Native) and its integration into Spark/Presto/Trino. (HUDI-512)
    • (Vinoth) (red star) Are we happy with DT <> MT sync mechanism? does this need to be revisited? (HUDI-2461 + other issues with Flink OCC)
  • Implementation
    • (Lin) (green star) Finalize RFC-46/RecordMerger API, cross-platform support, only invoked for hoodie.merge.mode=custom ? (complete HUDI-3217) Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.
    • (Ethan) (star) Implement MoR snapshot query (positional/key based updates, deletes), partial updates, custom merges on new File Format code path.  Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.   Unable to render Jira issues macro, execution error.   Unable to render Jira issues macro, execution error.
    • (Danny) Implement Non blocking CC for Spark.Parity with what Flink has. 
    • (Lin) (blue star) Implement a uniform way to read incremental data files based on new timeline (https://issues.apache.org/jira/browse/HUDI-2750)
    • (Ethan) (star) Implement writers for positional updates, deletes, partial updates, ordering field-based merging. Unable to render Jira issues macro, execution error.    Unable to render Jira issues macro, execution error.   Unable to render Jira issues macro, execution error.
    • (Ethan) (star) Implement engine agnostic FileGroup Read APIs across Spark/Hive  Unable to render Jira issues macro, execution error.
    • (Ethan/Lin?) (blue star) Implement different query types in new FIlgeGroup reader for Spark  Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.
    • (Sagar) (green star) Async indexer is in final shape (complete HUDI-2488)
    • (Sagar) Existing Optimistic Concurrency Control is in final shape (complete HUDI-1456)
    • (Danny) (green star) Land LSM Timeline in well-tested, performant shape (HUDI-309)
    • (Danny) (green star) Flink/Non-blocking CC (HUDI-6640, HUDI-6495 )

Execution Phase 2 (Nov 1-Nov 30)

  • Pre-work
    • (Vinoth/Balaji) (blue star) Land all relevant prs
  • APIs: (https://issues.apache.org/jira/browse/HUDI-4141)
  • Design
    • (Vinoth) (blue star) General purpose, global timeline (no active vs archived distinction) (HUDI-309 Unable to render Jira issues macro, execution error. )
    • (Vinoth) (blue star) Non-blocking concurrency control/clustering + updates, inserts + inserts for Spark + Flink.
    • (Vinoth) (blue star) Spark SQL statements to complete DB vision. (vinoth has a list. ???)
    • (Vinoth) (blue star) Lance file format + storing blobs/images.(Needs an epic)
    • (Vinoth) (blue star) Redesign Hudi MT as an internal partition of the data table, exposing "files" metadata alone outside (HUDI-2461 etc)
    • (Vinoth) (blue star) Backwards compatibility testing. 1.0 reader can read 0.x format? reader/writer/table version?
  • Implementation
    • (Sagar/Jon) (blue star) Schema Evolution and version tracking in MT. Unable to render Jira issues macro, execution error.
    • (Sagar/Jon) Schema on read support 
    • (??) MT <> DT redesign 
    • (Lin) (red star) Land Parquet keyed lookup code (???)
    • (blue star) MT/RLI on Parquet base files
    • (???) (blue star) Introduce TrueTime API or equivalent, to explain the foundations more clearly. (reuse HUDI-3057)
    • (Danny) Follow ups on LSM Timeline. Unable to render Jira issues macro, execution error.  
    • (Vinoth) (star) Implement DataFrame based write path; Take HoodieData abstraction to completion and end-end row writing for Spark? All write operations work with rows end-end (HUDI-4857)
    •  
    • (Danny) (blue star) Change Timeline/FileSystemView to support snapshot, incremental, CDC, time-travel queries correctly based on completion time
    • (Sagar) (blue star) Secondary indexes (Bloom, RLI, VectorIndex, ..) on Spark read/write path. (HUDI-3907, HUDI-4128)
    • (Sagar) Meta Sync to Glue/HMS with reduced storage/API overhead (HUDI-2519, HUDI-5108, HUDI-6488), seamless inc query, cdc query, ro/rt experience
    • (Lin) SQL experience for timeline, metadata. (HUDI-6498)
    • [Rajesh???] Parquet Rewriting at Page Level for Spark Rows (Writer perf) (HUDI-4790)
    • Minimize configs and cleanup defaults (https://issues.apache.org/jira/browse/HUDI-1239)
  • Open/Risk Items:

GA Phase (Dec 1- Dec 31)(Marked 1.1.0 for now)

  • Release (if still pending!)
  • Docs
  • Examples
  • Bundles & Packages (HUDI-3529)
  • Site updates
  • Deprecate/Cleanup cWiki

Below the line (Marked 1.1.0 for now)

  • No labels

2 Comments

  1. RFC-46 + Row Writing + New Format Spark R/W 

    Exit Criteria 

    • Spark MoR Snapshot, Incremental, ReadOptimized, CDC, TimeTravel queries on new storage format.
    • Positional update, delete, partial update, event_time based merge and custom merger support on read and write paths.
    • Finalize RecordMerger API for use with java, python and other languages. 
    • Engine agnostic FileGroupReader "internal" API, replaces Spark and Hive reads.
    • (Stretch) End-to-End DataFrame writing for Spark, GPU Acceleration via Rapids etc. FileGrouper writer API is deprioritized. 


  2. EPIC HUDI-3217 

    • (15pt P0) Finalize RecordMerger API for use with java, python and other languages.

    EPIC HUDI-6243

    • (17pt) Engine agnostic FileGroupReader "internal" API, replaces Spark and Hive reads.

    EPIC HUDI-6722:

    • (22pt) Positional update, delete, partial update, event_time based merge and custom merger support on read and write paths.

    EPIC HUDI-6243

    • (27pt) Spark MoR Snapshot, Incremental, ReadOptimized, CDC, TimeTravel queries on new storage format.