Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Quick Links 

Issue Management

Actual issue tracking is in Apache JIRA!  We use this page to ground ourselves.

  1. Any issue you file, please file as "Issue" and not as "Sub task" (sub tasks cannot be added to Epics) 
  2. Please attach issues to an EPIC Epic as much as possible, so it does not scatter around. (see 1.0 Epics
  3. Keep issues unassigned, unless you are about to begin working on it. 
  4. Issue must be tagged with Fix Version/s: 1.0.0 to show up on the board.
  5. If you have a PR up, please ensure the JIRA is in "Review" state and mark the "Reviewers" field with who your review is blocked on. 
  6. Vinoth Chandar  will move issues from 1.0.0 to 1.1.0 if it does not seem important.Pending project management tasks:
    •  Vinoth to create a "roadmap" in JIRA
    •  Vinoth to go into each Epic deeply, clean up tasks themselves.
    •  Vinoth to scout for R.M.

Roadmap to visualize which epics are in what phase. 

Sync Meeting Format

Daily 7pm PST, ping Vinoth Chandar to be added 

  • Report status, planned next steps, call out any blockers/discussion items (1 min each max)
  • Update this execution planner, see if we need to change course, adjust plans
  • DIscuss blockers, Live jams to resolve issues within bounds of meeting. 

Execution Phase 1 (Aug 15-

...

Oct 31)

Focus: Spark, Flink (for NB Concurrency Control)
(green star) - In progress/on track (red star) - blocked (star) - In progress/slipping (blue star) - Not started


  • (Vinoth) Identify & land all critical outstanding PRs (that solve critical issues, take us forward in our 1.0 path)
  • (Ethan Sagar & Vinoth & Danny) Land storage format 1.0
    •  (
    Complete)
    •  [Vinoth] Put up a 1.0 tech specs doc
    •  Make all format changes described here. Vinoth) (star) Scope this epic tight.  https://issues.apache.org/jira/browse/HUDI-6242 Standardization  
    •  (Sagar) (green star) Make all the agreed upon format changes described here.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6776
    •  (Ethan) (green star) Standardization of serialization - log blocks, timeline meta files.
    •  Change Timeline/FileSystemView to support snapshot, incremental, CDC, time-travel queries correctly.
    •  Changes to make multiple base file formats within each file group.
    • Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6824
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6825
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6826
       
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6850
    •  (Sagar) (star) Base file format can be different within file groups 
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6821
       
    •  (Sagar) (green star)  No Java classes show up in table properties. HUDI-5761 [Danny]  
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6780
    •  (Danny) (green star) Introduce transition time into the active timeline
    •  [Danny] Land LSM Timeline in well-tested, performant shape (HUDI-309, HUDI-6626, this needs an epic ASAP???)
    • Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-1623
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6775
    •  (Danny) (green star) Remove log block append for multiple commits
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6742
        
    •  (Danny)(green star) Introduces new completion time based file slicing
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6642
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6743
  • Design:
    •  (Sagar) (green star) Multi-table transactions? (
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6709
      )
    •  (Lin)
    Design:
    •  [Sagar] Multi-table transactions? (VC: we have a strawman. but needs an RFC to validate correctness across phantom reads, self-joins, nested queries, and isolation levels)
    •  [Lin] Keys: UUIDs vs. what we do today.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6701
    •  (Vinoth) (star)  Put up a 1.0 tech specs doc
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6706
        
    •  (Vinoth) (star) OCC/ [Danny???] Time-Travel Read (+Write) (resolve HUDI-4500, HUDI-4677 and similar, address branch/merge use-cases)
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-4677
       
    •  (Vinoth/Danny) (star) Time-Travel read on NB CC & finalize NB CC design
    •  (Danny) TrueTime API implementation for Hudi (wait based, or filesystem/stateless based)
    •  (Vinoth/Shawn) Cloud native storage layout design (Udit's RFC-60)
    •  (Sagar/Vinoth) (green star)  [Ethan???] Logical partitioning/Index Functions API (Java, Native) and its integration into Spark/Presto/Trino. (HUDI-512)
    •  [Shawn] Cloud native storage layout design (Udit's RFC-60)
    •  [Sagar + ???] Schema Evolution and version tracking in MT.
    •  [Vinoth] Lance file format + storing blobs/images.
    • (Vinoth) (red star) Are we happy with DT <> MT sync mechanism? does this need to be revisited? (HUDI-2461 + other issues with Flink OCC)
  • Implementation
    •  (Lin) (green star) Finalize
    Implementation
    •  [Sagar] RFC-46/RecordMerger API, is this our final choice? cross-platform ? support, only invoked for hoodie.merge.mode=custom ? (complete HUDI-3217)
    •  [Sagar] Async indexer is in final shape (complete HUDI-2488)
    •  [Lin] Land Parquet keyed lookup code (???)
    •  [Danny] Flink/Non-blocking CC (HUDI-5672, HUDI-6640, HUDI-6495 )
    •  [???] Parquet Rewriting at Page Level for Spark Rows (Writer perf) (HUDI-4790)
    • Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6702
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6765
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6784
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-5249
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-5807
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6767
    •  (Ethan) (star)  [Ethan] Implement MoR snapshot query (positional/key based updates, deletes), partial updates, custom merges on new File Format code path.
    •  [Ethan] Implement writers for positional updates, deletes, partial updates, ordering field based merging.
    •  Existing Optimistic Concurrency Control is in final shape (complete HUDI-1456)
    •  
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6796
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6797
       
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6798
       
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6801
    •  (Danny) Implement Non blocking CC for Spark.Parity with what Flink has. 
    •  (Lin) (blue star) Implement a uniform way to fetch read incremental data files based on new timeline (https://issues.apache.org/jira/browse/HUDI-2750)
    •  <what are some other code refactoring.. to burn down?> (, HUDI-2261, HUDI-6243, HUDI-3614, HUDI-4444, HUDI-4756)
  • (Sagar) Open/Risk Items:

Execution Phase 2 (Sept 15-Oct 30)

    • (Ethan) (star) Implement writers for positional updates, deletes, partial updates, ordering field-based merging.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6653
        
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6795
       
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6800
    •  (Ethan) (star) Implement engine agnostic FileGroup Read APIs across Spark/Hive 
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6785
    •  (Ethan/Lin?) (blue star) Implement different query types in new FIlgeGroup reader for Spark 
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6786
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6789
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6790
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6792
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6793
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6794
      Jira
      serverASF JIRA
      columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6802
    •  (Sagar) (green star) Async indexer is in final shape (complete HUDI-2488)
    •  (Sagar) Existing Optimistic Concurrency Control is in final shape (complete HUDI-1456)
    •  (Danny) (green star) Land LSM Timeline in well-tested, performant shape (HUDI-309)
    •  (Danny) (green star) Flink/Non-blocking CC (HUDI-6640, HUDI-6495 )

Execution Phase 2 (Nov 1-Nov 30)

  •  Pre-work
    •  (Vinoth/Balaji) (blue star) Land all relevant prs
  •  APIs: (https://issues.apache.org/jira/browse/HUDI-4141)
  •  Design
    •  (Vinoth) (blue star) General purpose, global timeline (no active vs archived distinction) (HUDI-309
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6698
      )
    •  (Vinoth) (blue star) Non-blocking concurrency control/clustering + updates, inserts + inserts for Spark + Flink.
    •  (Vinoth) (blue star) Spark SQL statements to complete DB vision. (vinoth has a list. ???)
     Implementation
  •  Implementation
    •  (Sagar/Jon) (blue star) Schema Evolution and version tracking in MT.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6778
    •  (Sagar/Jon) Schema on read support 
    •  (??) MT <> DT redesign 
    •  (Lin) (red star) Land Parquet keyed lookup code (???)
    •  (blue star) MT/RLI on Parquet base files
    •  (???) (blue star) Introduce TrueTime API or equivalent, to explain the foundations more clearly. (reuse HUDI-3057)
    •  (Danny) Follow ups on LSM Timeline.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6698
       
    •  (Vinoth) (star) Implement DataFrame based write path; Take HoodieData abstraction to completion and end-end row writing for Spark? All write operations work with rows end-end (HUDI-4857)
    •  
    •  (Danny) (blue star) Change Timeline/FileSystemView to support snapshot, incremental, CDC, time-travel queries correctly based on completion time
    •  (Sagar) (blue star) Secondary indexes (Bloom, RLI, VectorIndex, ..) on Spark read/write path. (HUDI-3907, HUDI-4128)
    •  (Sagar) Meta Sync to Glue/HMS with reduced storage/API overhead (HUDI-2519, HUDI-5108, HUDI-6488), seamless inc query, cdc query, ro/rt experience
    •  Broader Performance improvements (HUDI-3249)
    •  Encoding updates as deletes + inserts. (HUDI-6490)
    •  (Lin) SQL experience for timeline, metadata. (HUDI-6498)
    •  Introduce TrueTime API or equivalent, to explain the foundations more clearly. (reuse HUDI-3057)
    •  Introduce HudiStorage APIs to abstract out Hadoop FileSystem. (HUDI-6497)

...

GA Phase (Dec 1- Dec 31)(Marked 1.1.0 for now)

  •  Release (if still pending!)
  •  Docs
  •  Examples
  •  Bundles & Packages (HUDI-3529)
  •  Site updates
  •  Deprecate/Cleanup cWiki

Below the line (Marked 1.1.0 for now)