Quick Links

Issue Management

Actual issue tracking is in Apache JIRA! We use this page to ground ourselves.

Any issue you file, please file as "Issue" and not as "~~Sub task~~" (sub tasks cannot be added to Epics)
Please attach issues to an Epic as much as possible, so it does not scatter around. (see 1.0 Epics)
Keep issues unassigned, unless you are about to begin working on it.
Issue must be tagged with Fix Version/s: 1.0.0 to show up on the board.
If you have a PR up, please ensure the JIRA is in "Review" state and mark the "Reviewers" field with who your review is blocked on.
Vinoth Chandar will move issues from 1.0.0 to 1.1.0 if it does not seem important.

Roadmap to visualize which epics are in what phase.

Sync Meeting Format

Daily 7pm PST, ping Vinoth Chandar to be added

Report status, planned next steps, call out any blockers/discussion items (1 min each max)
Update this execution planner, see if we need to change course, adjust plans
DIscuss blockers, Live jams to resolve issues within bounds of meeting.

Execution Phase 1 (Aug 15-Oct 31)

Focus: Spark, Flink (for NB Concurrency Control)
- In progress/on track - blocked - In progress/slipping - Not started

(Vinoth) Identify & land all critical outstanding PRs (that solve critical issues, take us forward in our 1.0 path)
- (Vinoth) to identify. https://github.com/apache/hudi/pulls?q=is%3Apr+is%3Aopen+label%3Arelease-1.0.0
- (Sagar) Move master to 1.0.0
(Sagar & Vinoth & Danny) Land storage format 1.0
- (Vinoth) Scope this epic tight. https://issues.apache.org/jira/browse/HUDI-6242
- (Sagar) Make all the agreed upon format changes described here. Unable to render Jira issues macro, execution error.
- (Ethan) Standardization of serialization - log blocks, timeline meta files. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.
- (Sagar) Base file format can be different within file groups Unable to render Jira issues macro, execution error.
- (Sagar) No Java classes show up in table properties. Unable to render Jira issues macro, execution error.
- (Danny) Introduce transition time into the active timeline Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.
- (Danny) Remove log block append for multiple commits Unable to render Jira issues macro, execution error.
- (Danny) Introduces new completion time based file slicing Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.
Design:
- (Sagar) Multi-table transactions? ( Unable to render Jira issues macro, execution error. )
- (Lin) Keys: UUIDs vs. what we do today. Unable to render Jira issues macro, execution error.
- (Vinoth) Put up a 1.0 tech specs doc Unable to render Jira issues macro, execution error.
- (Vinoth) OCC/Time-Travel Read (+Write) Unable to render Jira issues macro, execution error.
- (Vinoth/Danny) Time-Travel read on NB CC & finalize NB CC design
- (Danny) TrueTime API implementation for Hudi (wait based, or filesystem/stateless based)
- (Vinoth/Shawn) Cloud native storage layout design (Udit's RFC-60)
- (Sagar/Vinoth) Logical partitioning/Index Functions API (Java, Native) and its integration into Spark/Presto/Trino. (HUDI-512)
- (Sagar/Vinoth) Schema Evolution and version tracking in MT. Unable to render Jira issues macro, execution error.
Implementation
- (Lin) Finalize RFC-46/RecordMerger API, cross-platform support, only invoked for hoodie.merge.mode=custom ? (complete HUDI-3217) Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.
- (Ethan) Implement MoR snapshot query (positional/key based updates, deletes), partial updates, custom merges on new File Format code path. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.
- (Lin) Implement a uniform way to read incremental data files based on new timeline (https://issues.apache.org/jira/browse/HUDI-2750)
- (Ethan) Implement writers for positional updates, deletes, partial updates, ordering field-based merging. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.
- (Ethan) Implement engine agnostic FileGroup Read APIs across Spark/Hive Unable to render Jira issues macro, execution error.
- (Ethan/Lin?) Implement different query types in new FIlgeGroup reader for Spark Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.
- (Vinoth) Implement DataFrame based write path; Take HoodieData abstraction to completion and end-end row writing for Spark? All write operations work with rows end-end (HUDI-4857)
- (Sagar) Async indexer is in final shape (complete HUDI-2488)
- (Sagar) Secondary indexes (Bloom, RLI, VectorIndex, ..) on Spark read/write path. (HUDI-3907, HUDI-4128)
- (Sagar) Existing Optimistic Concurrency Control is in final shape (complete HUDI-1456)
- (Lin) Land Parquet keyed lookup code (???)
- (Danny) Land LSM Timeline in well-tested, performant shape (HUDI-309)
- (Danny) Flink/Non-blocking CC (HUDI-6640, HUDI-6495 )
- (Danny) Change Timeline/FileSystemView to support snapshot, incremental, CDC, time-travel queries correctly.
- (???) Introduce TrueTime API or equivalent, to explain the foundations more clearly. (reuse HUDI-3057)
- <what are some other code refactoring.. to burn down?> (, HUDI-2261, HUDI-6243, HUDI-3614, HUDI-4444, HUDI-4756)
(Sagar) Open/Risk Items:
- (Sagar) Are we happy with how log compaction is implemented? (https://issues.apache.org/jira/browse/HUDI-3580)
- (Vinoth) Should we retain virtual keys support? https://issues.apache.org/jira/browse/HUDI-2235

Execution Phase 2 (Nov 1-Nov 30)

Pre-work
- (Vinoth) Land all relevant prs
APIs: (https://issues.apache.org/jira/browse/HUDI-4141)
- FileGroup APIs in Java
- Rust/C++ APIs for Timeline, Metadata, FileGroup Read/Write (https://issues.apache.org/jira/browse/HUDI-6486)
- Internal APIs/Abstractions/Code Refactoring (https://issues.apache.org/jira/browse/HUDI-6243)
  - HUDI-43
  - HoodieSchema ? https://issues.apache.org/jira/browse/HUDI-6499
Design
- (Vinoth) General purpose, global timeline (no active vs archived distinction) (HUDI-309, Unable to render Jira issues macro, execution error. )
- (Vinoth) Non-blocking concurrency control/clustering + updates, inserts + inserts for Spark + Flink.
- (Vinoth) Spark SQL statements to complete DB vision. (vinoth has a list. ???)
- (Vinoth) Lance file format + storing blobs/images.(Needs an epic)
Implementation
- Multi-table transaction
- MT/RLI on Parquet base files
- Follow ups on LSM Timeline. Unable to render Jira issues macro, execution error.
- Minimize configs and cleanup defaults (https://issues.apache.org/jira/browse/HUDI-1239)
- Meta Sync to Glue/HMS with reduced storage/API overhead (HUDI-2519, HUDI-5108, HUDI-6488), seamless inc query, cdc query, ro/rt experience
- Broader Performance improvements (HUDI-3249)
- SQL experience for timeline, metadata. (HUDI-6498)
- [???] Parquet Rewriting at Page Level for Spark Rows (Writer perf) (HUDI-4790)
- Introduce HudiStorage APIs to abstract out Hadoop FileSystem. (HUDI-6497)
Open/Risk Items:
- (Ethan/Danny) _hoodie_operation metafield. Spark/Flink interop.
- (Vinoth) Are we happy with DT <> MT sync mechanism? does this need to be revisited? (HUDI-2461 + other issues with Flink OCC)

GA Phase (Dec 1- Dec 31)(Marked 1.1.0 for now)

Release (if still pending!)
Docs
Examples
Bundles & Packages (HUDI-3529)
Site updates
Deprecate/Cleanup cWiki

Below the line (Marked 1.1.0 for now)

Unstructured Hudi table.
Implement Non blocking CC for Spark...
Encoding updates as deletes + inserts. (HUDI-6490)
Native HFile reader/writer in Hudi. (VC: This was punted since we'd default to Parquet based MDT)
Streaming Performance: optimize the current upsert DAG on MetadataIndex (hybrid of RLI, Bloom Index, ....)
Column family use-case (sparse rows on wide tables??)
Cool new indexes
- Spatial Index
- Search/Lucene Index
- Bitmap Index,
Hive Storage Handler
MT integration across Presto, Trino (HUDI-4552, HUDI-4394)
Presto : Snapshot, Incremental, Time Travel, CDC queries (on MT) (https://issues.apache.org/jira/browse/HUDI-3210)
Trino: (repeat above https://issues.apache.org/jira/browse/HUDI-2687)
Demos
- Killer dbt demo (https://issues.apache.org/jira/browse/HUDI-6586)
Dev Hygiene
- https://issues.apache.org/jira/browse/HUDI-2597
Tests
- Reduce test runtime. HUDI-1574
- https://issues.apache.org/jira/browse/HUDI-2638
- https://issues.apache.org/jira/browse/HUDI-3121

Space shortcuts

Page tree

Quick Links

Issue Management

Sync Meeting Format

Execution Phase 1 (Aug 15-Oct 31)

Execution Phase 2 (Nov 1-Nov 30)

GA Phase (Dec 1- Dec 31)(Marked 1.1.0 for now)

Below the line (Marked 1.1.0 for now)

Space shortcuts

Page tree

1.0 Execution Planning

Quick Links

Issue Management

Sync Meeting Format

Execution Phase 1 (Aug 15-Oct 31)

Execution Phase 2 (Nov 1-Nov 30)

GA Phase (Dec 1- Dec 31)(Marked 1.1.0 for now)

Below the line (Marked 1.1.0 for now)