Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this RFC, we propose to support a feature to allow concurrent writing to a Hudi table. This will enable users to start multiple writer jobs writing in parallel to non-overlapping files. If a concurrent write to a file is issued from 2 writers, the first one to commit will succeed. The following guarantees provided by Hudi single writer model will NOT be guaranteed in parallel writing mode

  1. Unique records across partitions multiple writers during inserts
    1. If multiple writers are writing to different partitions files in parallel, Hudi cannot guarantee uniqueness of keys across partitions, unlike the single writer model. Ensuring unique keys is left up to the users
  2. ReadSerializability across partitions
    1. Since different writers to different partitions files can finish at varying times, thus committing data written to partitions files in any order, Hudi cannot guarantee read serializability.
  3. Global index support (only HbaseIndex)
    1. Since Global Index (e.g. HbaseIndex) requires a unique key across all partitions in the table, Hudi cannot support Global Index for tables requiring parallel writing due to constraint (1) above. Note, GlobalSimpleIndex works fine since it first partitions based on HoodieKey and then checks the record key per partition. 

...