Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  •  Pre-work
    •  (Vinoth/Balaji) (blue star) Land all relevant prs
  •  APIs: (https://issues.apache.org/jira/browse/HUDI-4141)
  •  Design
    •  (Vinoth) (blue star) General purpose, global timeline (no active vs archived distinction) (HUDI-309
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6698
      )
    •  (Vinoth) (blue star) Non-blocking concurrency control/clustering + updates, inserts + inserts for Spark + Flink.
    •  (Vinoth) (blue star) Spark SQL statements to complete DB vision. (vinoth has a list. ???)
    •  (Vinoth) (blue star) Lance file format + storing blobs/images.(Needs an epic)
    •  (Vinoth) (blue star) Redesign Hudi MT as an internal partition of the data table, exposing "files" metadata alone outside (HUDI-2461 etc)
    •  (Vinoth) (blue star) Backwards compatibility testing. 1.0 reader can read 0.x format? reader/writer/table version?
  •  Implementation
    •  (Sagar/Jon) (blue star) Schema Evolution and version tracking in MT.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6778
    •  (Sagar/Jon) Schema on read support 
    •  (??) MT <> DT redesign 
    •  (Lin) (red star) Land Parquet keyed lookup code (???)
    •  (blue star) MT/RLI on Parquet base files
    •  (???) (blue star) Introduce TrueTime API or equivalent, to explain the foundations more clearly. (reuse HUDI-3057)
    •  (Danny) Follow ups on LSM Timeline.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6698
       
    •  (Vinoth) (star) Implement DataFrame based write path; Take HoodieData abstraction to completion and end-end row writing for Spark? All write operations work with rows end-end (HUDI-4857)
    •  
    •  (Danny) (blue star) Change Timeline/FileSystemView to support snapshot, incremental, CDC, time-travel queries correctly based on completion time
    •  (Sagar) (blue star) Secondary indexes (Bloom, RLI, VectorIndex, ..) on Spark read/write path. (HUDI-3907, HUDI-4128)
    •  (Sagar) Meta Sync to Glue/HMS with reduced storage/API overhead (HUDI-2519, HUDI-5108, HUDI-6488), seamless inc query, cdc query, ro/rt experience
    •  (Lin) SQL experience for timeline, metadata. (HUDI-6498)
    •  [Rajesh???] Parquet Rewriting at Page Level for Spark Rows (Writer perf) (HUDI-4790)
    •  Minimize configs and cleanup defaults (https://issues.apache.org/jira/browse/HUDI-1239)
  •  Open/Risk Items:

...