THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- Pre-work
- (Vinoth/Balaji)
Land all relevant prs
- (Vinoth/Balaji)
- APIs: (https://issues.apache.org/jira/browse/HUDI-4141)
-
External APIs in Java for metadata, timeline, file groups r/w
-
Internal APIs/Abstractions/Code Refactoring (https://issues.apache.org/jira/browse/HUDI-6243)
-
- Design
- (Vinoth)
General purpose, global timeline (no active vs archived distinction) (HUDI-309,
)Jira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key HUDI-6698 - (Vinoth)
Non-blocking concurrency control/clustering + updates, inserts + inserts for Spark + Flink.
- (Vinoth)
Spark SQL statements to complete DB vision. (vinoth has a list. ???)
- (Vinoth)
Lance file format + storing blobs/images.(Needs an epic)
- (Vinoth)
Redesign Hudi MT as an internal partition of the data table, exposing "files" metadata alone outside (HUDI-2461 etc)
- (Vinoth)
Backwards compatibility testing. 1.0 reader can read 0.x format? reader/writer/table version?
- (Vinoth)
- Implementation
- (Sagar/Jon)
Schema Evolution and version tracking in MT.
Jira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key HUDI-6778 - (Sagar/Jon) Schema on read support
- (??) MT <> DT redesign
- (Lin)
Land Parquet keyed lookup code (???)
-
MT/RLI on Parquet base files
- (???)
Introduce TrueTime API or equivalent, to explain the foundations more clearly. (reuse HUDI-3057)
- (Danny) Follow ups on LSM Timeline.
Jira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key HUDI-6698 - (Vinoth)
Implement DataFrame based write path; Take HoodieData abstraction to completion and end-end row writing for Spark? All write operations work with rows end-end (HUDI-4857)
- (Danny)
Change Timeline/FileSystemView to support snapshot, incremental, CDC, time-travel queries correctly based on completion time
- (Sagar)
Secondary indexes (Bloom, RLI, VectorIndex, ..) on Spark read/write path. (HUDI-3907, HUDI-4128)
- (Sagar) Meta Sync to Glue/HMS with reduced storage/API overhead (HUDI-2519, HUDI-5108, HUDI-6488), seamless inc query, cdc query, ro/rt experience
- (Lin) SQL experience for timeline, metadata. (HUDI-6498)
- [Rajesh???] Parquet Rewriting at Page Level for Spark Rows (Writer perf) (HUDI-4790)
- Minimize configs and cleanup defaults (https://issues.apache.org/jira/browse/HUDI-1239)
- (Sagar/Jon)
- Open/Risk Items:
(Ethan/Danny)
_hoodie_operation
metafield. Spark/Flink interop.- (Sagar)
Are we happy with how log compaction is implemented? (https://issues.apache.org/jira/browse/HUDI-3580)
- (Vinoth)
Should we retain virtual keys support? https://issues.apache.org/jira/browse/HUDI-2235
...