THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- Pre-work
- (Vinoth)
Land all relevant prs
- (Vinoth)
- APIs: (https://issues.apache.org/jira/browse/HUDI-4141)
-
FileGroup APIs in Java
-
Rust/C++ APIs for Timeline, Metadata, FileGroup Read/Write (https://issues.apache.org/jira/browse/HUDI-6486)
-
Internal APIs/Abstractions/Code Refactoring (https://issues.apache.org/jira/browse/HUDI-6243)
-
- Design
- (Vinoth)
General purpose, global timeline (no active vs archived distinction) (HUDI-309,
)Jira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key HUDI-6698 - (Vinoth)
Non-blocking concurrency control/clustering + updates, inserts + inserts for Spark + Flink.
- (Vinoth)
Spark SQL statements to complete DB vision. (vinoth has a list. ???)
- (Vinoth)
Lance file format + storing blobs/images.(Needs an epic)
- (Vinoth) Backwards compatibility testing. 1.0 reader can read 0.x format?
- (Vinoth)
-
Multi-table transaction
-
MT/RLI on Parquet base files
- Follow ups on LSM Timeline.
Jira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key HUDI-6698 - Minimize configs and cleanup defaults (https://issues.apache.org/jira/browse/HUDI-1239)
- Meta Sync to Glue/HMS with reduced storage/API overhead (HUDI-2519, HUDI-5108, HUDI-6488), seamless inc query, cdc query, ro/rt experience
- Broader Performance improvements (HUDI-3249)
- SQL experience for timeline, metadata. (HUDI-6498)
- [???] Parquet Rewriting at Page Level for Spark Rows (Writer perf) (HUDI-4790)
- Introduce HudiStorage APIs to abstract out Hadoop FileSystem. (HUDI-6497)
Implementation-
- Open/Risk Items:
-
(Ethan/Danny)
_hoodie_operation
metafield. Spark/Flink interop. - (Vinoth)
Are we happy with DT <> MT sync mechanism? does this need to be revisited? (HUDI-2461 + other issues with Flink OCC)
- (Sagar)
Are we happy with how log compaction is implemented? (https://issues.apache.org/jira/browse/HUDI-3580)
- (Vinoth)
Should we retain virtual keys support? https://issues.apache.org/jira/browse/HUDI-2235
-
...