Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Report status, planned next steps, call out any blockers/discussion items (1 min each max)
  • Update this execution planner, see if we need to change course, adjust plans
  • DIscuss blockers, Live jams to resolve issues within bounds of meeting. 

Execution Phase 1 (Aug 15-

...

Oct 31)

Focus: Spark, Flink (for NB Concurrency Control)
(green star) - In progress/on track (red star) - blocked (star) - In progress/slipping (blue star) - Not started

...

  • (Vinoth) Identify & land all critical outstanding PRs (that solve critical issues, take us forward in our 1.0 path)
  • (Sagar & Vinoth & Danny) Land storage format 1.0  (Vinoth) (star)  Put up a 1.0 tech specs doc Jiraserver
    • ASF JIRAserverId5aa69414-a9e9-3523-82ec-879b028fb15bkeyHUDI-6706  (Vinoth) (star) Scope this epic tight.  https://issues.apache.org/jira/browse/HUDI-6242 
    •  (Sagar) (green star) Make all the agreed upon format changes described here.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6776
    •  (Ethan) (green star) Standardization of serialization - log blocks, timeline meta files.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6824
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6825
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6826
       
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6850
    •  (Sagar) (star) Base file format can be different within file groups 
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6821
       
    •  (Sagar) (green star) No Java classes show up in table properties.  
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6780
    •  (Danny) (green star) Introduce transition time into the active timeline
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-1623
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6775
    •  (Danny) (green star) Remove log block append for multiple commits
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6742
        
    •  (Danny)(green star) Introduces new completion time based file slicing
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6642
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6743
  • Design:
    •  (Sagar) (green star) Multi-table transactions? (
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6709
      )
    •  (Lin) Keys: UUIDs vs. what we do today.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6701
    •  (Vinoth) (star) OCC/Time- Put up a 1.0 tech specs doc
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6706
        
    •  (Vinoth) (star) OCC/Time-Travel Read (+Write)
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-4677
       
    •  (Vinoth/Danny) (star) Time-Travel read on NB CC & finalize NB CC design
    •  (Danny) TrueTime API implementation for Hudi (wait based, or filesystem/stateless based)
    •  (Vinoth/Shawn) Cloud native storage layout design (Udit's RFC-60)
    •  (Sagar/Vinoth) (green star) Logical partitioning/Index Functions API (Java, Native) and its integration into Spark/Presto/Trino. (HUDI-512)
    •  (Sagar/Vinoth) (blue star) Schema Evolution and version tracking in MT. JiraserverASF JIRAserverId5aa69414-a9e9-3523-82ec-879b028fb15bkeyHUDI-6778(red star) Are we happy with DT <> MT sync mechanism? does this need to be revisited? (HUDI-2461 + other issues with Flink OCC)
  • Implementation
    •  (Lin) (green star) Finalize RFC-46/RecordMerger API, cross-platform support, only invoked for hoodie.merge.mode=custom ? (complete HUDI-3217)
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6702
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6765
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6784
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-5249
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-5807
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6767
    •  (Ethan) (star) Implement MoR snapshot query (positional/key based updates, deletes), partial updates, custom merges on new File Format code path. 
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6796
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6797
       
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6798
       
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6801
    •  (Danny) Implement Non blocking CC for Spark.Parity with what Flink has. 
    •  (Lin) (blue star) Implement a uniform way to read incremental data files based on new timeline (https://issues.apache.org/jira/browse/HUDI-2750)
    •  (Ethan) (star) Implement writers for positional updates, deletes, partial updates, ordering field-based merging.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6653
         
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-67996795
       
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-67956800
    •  (Ethan) (star) Implement engine agnostic FileGroup Read APIs across Spark/Hive 
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-68006785
    •  (Ethan/Lin?) (star) Implement engine agnostic FileGroup Read APIs across Spark/Hive  (blue star) Implement different query types in new FIlgeGroup reader for Spark 
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6785
       (Ethan/Lin?) (blue star) Implement different query types in new FIlgeGroup reader for Spark 
      6786
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6789
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6790
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-67916792
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-67926793
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-67936794
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6794
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6802
    •  (Vinoth) (star) Implement DataFrame based write path; Take HoodieData abstraction to completion and end-end row writing for Spark? All write operations work with rows end-end (HUDI-4857)
    •  (Sagar) (green star) Async indexer is in final shape (complete HUDI-2488)
    •  (Sagar) (blue star) Secondary indexes (Bloom, RLI, VectorIndex, ..) on Spark read/write path. (HUDI-3907, HUDI-4128)
    •  (Sagar) Existing Optimistic Concurrency Control is in final shape (complete HUDI-1456)
    •  (Lin) (red star) Land Parquet keyed lookup code (???)
    •  (Danny) (green star) Land LSM Timeline in well-tested, performant shape (HUDI-309)
    •  (Danny) (green star) Flink/Non-blocking CC (HUDI-6640, HUDI-6495 )
    •  (Danny) (blue star) Change Timeline/FileSystemView to support snapshot, incremental, CDC, time-travel queries correctly.
    •  (???) (blue star) Introduce TrueTime API or equivalent, to explain the foundations more clearly. (reuse HUDI-3057)
    • 6802
    •  (Sagar) (green star) Async indexer is in final shape (complete HUDI-2488)
    •  (Sagar) Existing Optimistic Concurrency Control is in final shape (complete HUDI-1456)
    •  (Danny) (green star) Land LSM Timeline in well-tested, performant shape (HUDI-309)
    •  (Danny) (green star) Flink/Non-blocking CC (HUDI-6640, HUDI-6495 )

Execution Phase 2 (Nov 1-Nov 30)

...

      • )
      •  Introduce HudiStorage APIs to abstract out Hadoop FileSystem. (HUDI-6497)
  •  Design
    •  (Vinoth) (blue star) General purpose, global timeline (no active vs archived distinction) (HUDI-309
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6698
      )
    •  (Vinoth) (blue star) Non-blocking concurrency control/clustering + updates, inserts + inserts for Spark + Flink.
    •  (Vinoth) (blue star) Spark SQL statements to complete DB vision. (vinoth has a list. ???)
    •  (Vinoth) (blue star) Lance file format + storing blobs/images.(Needs an epic)
    •  (Vinoth) (blue star) Redesign Hudi MT as an internal partition of the data table, exposing "files" metadata alone outside (HUDI-2461 etc)
    •  (Vinoth) (blue star) Backwards compatibility testing. 1.0 reader can read 0.x format? reader/writer/table version?
  •  Implementation
    •  

...

...

Execution Phase 2 (Sept 15-Oct 30)

  •  APIs: (https://issues.apache.org/jira/browse/HUDI-4141)
    •  (blue star) FileGroup APIs in Java
     (blue star) Internal APIs/Abstractions/Code Refactoring (https://issues.apache.org/jira/browse/HUDI-6243) HUDI-43 Design
    •  (Vinoth) (blue star) General purpose, global timeline (no active vs archived distinction) (HUDI-309(Sagar/Jon) (blue star) Schema Evolution and version tracking in MT.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-
      6698
      6778
    •  (Sagar/Jon) Schema on read support 
    •  (Vinoth) (blue star) Non-blocking concurrency control/clustering + updates, inserts + inserts for Spark + Flink.
    •  (Vinoth) (blue star) Spark SQL statements to complete DB vision. (vinoth has a list. ???)
    •  (Vinoth) (blue star) Lance file format + storing blobs/images.(Needs an epic)
     Implementation
    •  (blue star) Multi-table transaction
    •  (blue star) ??) MT <> DT redesign 
    •  (Lin) (red star) Land Parquet keyed lookup code (???)
    •  (blue star) MT/RLI on Parquet base files
    •  (???) (blue star) Introduce TrueTime API or equivalent, to explain the foundations more clearly. (reuse HUDI-3057)
    •  (Danny) Follow ups on LSM Timeline.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6698
       
    •  Minimize configs and cleanup defaults (https://issues.apache.org/jira/browse/HUDI-1239)
    • HUDI-6698
       
    •  (Vinoth) (star) Implement DataFrame based write path; Take HoodieData abstraction to completion and end-end row writing for Spark? All write operations work with rows end-end (HUDI-4857)
    •  
    •  (Danny) (blue star) Change Timeline/FileSystemView to support snapshot, incremental, CDC, time-travel queries correctly based on completion time
    •  (Sagar) (blue star) Secondary indexes (Bloom, RLI, VectorIndex, ..) on Spark read/write path. (HUDI-3907, HUDI-4128)
    •  (Sagar)   Meta Sync to Glue/HMS with reduced storage/API overhead (HUDI-2519, HUDI-5108, HUDI-6488), seamless inc query, cdc query, ro/rt experience
    •  Broader Performance improvements (HUDI-3249) (Lin) SQL experience for timeline, metadata. (HUDI-6498)
    •  [Rajesh???] Parquet Rewriting at Page Level for Spark Rows (Writer perf) (HUDI-4790)
    •  Introduce HudiStorage APIs to abstract out Hadoop FileSystem. (HUDI-6497Minimize configs and cleanup defaults (https://issues.apache.org/jira/browse/HUDI-1239)
  •  Open/Risk Items:
    •  (Ethan/Danny)(red star) _hoodie_operation metafield. Spark/Flink interop.hoodie_operation metafield. Spark/Flink interop.
    •  (Sagar)(blue star) Are we happy with how log compaction is implemented? (https://issues.apache.org/jira/browse/HUDI-3580)
    •  (Vinoth) (red star) Are we happy with DT <> MT sync mechanism? does this need to be revisited? (HUDI-2461 + other issues with Flink OCC)

...

GA Phase (Dec 1- Dec 31)(Marked 1.1.0 for now)

  •  Release (if still pending!)
  •  Docs
  •  Examples
  •  Bundles & Packages (HUDI-3529)
  •  Site updates
  •  Deprecate/Cleanup cWiki

...