THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- Any issue you file, please attach to an EPIC as much as possible, so it does not scatter around. (see 1.0 Epics)
- Keep issues unassigned, unless you are about to begin working on it.
- Issue must be tagged with
Fix Version/s: 1.0.0
to show up on the board. - Vinoth Chandar will move issues from
1.0.0
to1.1.0
if it does not seem important. - Pending project management tasks:
- (Vinoth to ) to create a "roadmap" in JIRA
- (Vinoth) to go into each Epic deeply, clean up tasks themselves.
- (Vinoth) to scout for R.M.
Execution Phase 1 (Aug 15-Sept 15)
- (Vinoth) Identify & land all critical outstanding PRs (that solve critical issues, take us forward in our 1.0 path)
- (Vinoth) to identify.
- [(Sagar] Move ) Move
master
to 1.0.0
- (Ethan & Vinoth & Danny) Land storage format 1.0 (Complete)
- [(Vinoth] Put ) Put up a 1.0 tech specs doc
- Make all format changes described here. https://issues.apache.org/jira/browse/HUDI-6242
- Standardization of serialization - log blocks, timeline meta files.
- Change Timeline/FileSystemView to support snapshot, incremental, CDC, time-travel queries correctly.
- Changes to make multiple base file formats within each file group.
- No Java classes show up in table properties. HUDI-5761
- [(Danny] Introduce ) Introduce transition time into the active timeline
- [(Danny] Land ) Land LSM Timeline in well-tested, performant shape (HUDI-309, HUDI-6626, this needs an epic ASAP???)
- Design:
- [(Sagar] Multi) Multi-table transactions? (VC: we have a strawman. but needs an RFC to validate correctness across phantom reads, self-joins, nested queries, and isolation levels)
- [(Lin] Keys) Keys: UUIDs vs. what we do today.
- [(Danny???] Time) Time-Travel Read (+Write) (resolve HUDI-4500, HUDI-4677 and similar, address branch/merge use-cases)
- [(Ethan???] Logical ) Logical partitioning/Index Functions API (Java, Native) and its integration into Spark/Presto/Trino. (HUDI-512)
- [(Shawn] Cloud ) Cloud native storage layout design (Udit's RFC-60)
- [(Sagar + ???] Schema ) Schema Evolution and version tracking in MT.
- [(Vinoth] Lance ) Lance file format + storing blobs/images.
- Implementation
- [(Sagar] RFC) RFC-46/RecordMerger API, is this our final choice? cross-platform? only for
hoodie.merge.mode=custom
? (complete HUDI-3217) - [(Sagar] Async ) Async indexer is in final shape (complete HUDI-2488)
- [(Lin] Land ) Land Parquet keyed lookup code (???)
- [(Danny] Flink) Flink/Non-blocking CC (HUDI-5672, HUDI-6640, HUDI-6495 )
- [???] Parquet Rewriting at Page Level for Spark Rows (Writer perf) (HUDI-4790)
- [(Ethan] Implement ) Implement MoR snapshot query (positional/key based updates, deletes), partial updates, custom merges on new File Format code path.
- [(Ethan] Implement ) Implement writers for positional updates, deletes, partial updates, ordering field based merging.
- Existing Optimistic Concurrency Control is in final shape (complete HUDI-1456)
- Implement a uniform way to fetch incremental data files based on new timeline (https://issues.apache.org/jira/browse/HUDI-2750)
- <what are some other code refactoring.. to burn down?> (, HUDI-2261, HUDI-6243, HUDI-3614, HUDI-4444, HUDI-4756)
- [(Sagar] RFC) RFC-46/RecordMerger API, is this our final choice? cross-platform? only for
- (Sagar) Open/Risk Items:
-
_hoodie_operation
metafield. Spark/Flink interop. - Are we happy with DT <> MT sync mechanism? does this need to be revisited? (HUDI-2461 + other issues)
- Are we happy with how log compaction is implemented? (https://issues.apache.org/jira/browse/HUDI-3580)
- Should we retain virtual keys support? https://issues.apache.org/jira/browse/HUDI-2235
-
...