Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • (Vinoth) Identify & land all critical outstanding PRs (that solve critical issues, take us forward in our 1.0 path)
  • (Sagar & Vinoth & Danny) Land storage format 1.0 
    •  (Vinoth) Put up a 1.0 tech specs doc
    •  (Sagar) Make all format changes described here. https://issues.apache.org/jira/browse/HUDI-6242
    •  (Sagar) Standardization of serialization - log blocks, timeline meta files.
    •  (Sagar) Change Timeline/FileSystemView to support snapshot, incremental, CDC, time-travel queries correctly.
    •  (Danny) Introduce TrueTime API or equivalent, to explain the foundations more clearly. (reuse HUDI-3057)
    •  (Sagar) Changes to make multiple base file formats within each file group.
    •  (Sagar) No Java classes show up in table properties. HUDI-5761
    •  (Danny) Introduce transition time into the active timeline
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-1623
    •  (Danny) Land LSM Timeline in well-tested, performant shape (HUDI-309, HUDI-6626, HUDI-6698)
  • Design:
    •  (Sagar) Multi-table transactions? (
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6709
      )
    •  (Lin) Keys: UUIDs vs. what we do today.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-6701
    •  (Vinoth???) Time-Travel Read (+Write) (address branch/merge use-cases)
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyHUDI-4677
       
    •  (Danny) TrueTime API implementation for Hudi (wait based, or filesystem/stateless based)
    •  (Vinoth/Shawn) Cloud native storage layout design (Udit's RFC-60)
    •  (Ethan???) Logical partitioning/Index Functions API (Java, Native) and its integration into Spark/Presto/Trino. (HUDI-512)
    •  (Sagar + ???) Schema Evolution and version tracking in MT.
  • Implementation
    •  (EthanLin) RFC Finalize RFC-46/RecordMerger API, is this our final choice? cross-platform ? support, only invoked for hoodie.merge.mode=custom ? (complete HUDI-3217)
    •  (Ethan) Implement MoR snapshot query (positional/key based updates, deletes), partial updates, custom merges on new File Format code path.
    •  (Lin) Implement a uniform way to fetch incremental data files based on new timeline (https://issues.apache.org/jira/browse/HUDI-2750)
    •  (Ethan) Implement writers for positional updates, deletes, partial updates, ordering field based merging.
    •  (Ethan) Implement engine agnostic FileGroup Read /Write APIs.APIs across Spark/Hive
    •  (Ethan) Implement DataFrame based write path; Take HoodieData abstraction to completion and end-end row writing for Spark? All write operations work with rows end-end (HUDI-4857)
    •  (Ethan) Implement a uniform way to fetch incremental data files based on new timeline (https://issues.apache.org/jira/browse/HUDI-2750) (Sagar) Async indexer is in final shape (complete HUDI-2488)
    •  (Sagar) Existing Optimistic Concurrency Control is in final shape (complete HUDI-1456)
    •  (Lin) Land Parquet keyed lookup code (???)
    •  (Danny) Flink/Non-blocking CC (HUDI-6640, HUDI-6495 )
    •  [???] Parquet Rewriting at Page Level for Spark Rows (Writer perf) (HUDI-4790)
    •  <what are some other code refactoring.. to burn down?> (, HUDI-2261, HUDI-6243, HUDI-3614, HUDI-4444, HUDI-4756)
  • (Sagar) Open/Risk Items:

...

...