Table of Contents |
---|
Proposers
Approvers
- @<approver1 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
- @<approver2 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
Status
Current state:
Current State | |
---|---|
UNDER DISCUSSION | |
IN PROGRESS | |
ABANDONED | |
COMPLETED | |
INACTIVE |
Discussion thread: TODO
JIRA: TODO
Released: <Hudi Version>
Abstract
Hudi tables allow many operations to be performed on it, along with a very popular one, upsert(). To be able to support upserts, Hudi depends on an indexing scheme to route the incoming record to the correct file.
Currently, Hudi index implementation is pluggable and provides two options:
...
- 2.Multiple file groups per bucket: this is useful if data is skewed writing or grows a lot.
Comparsion
Pattern 1 | Pattern 2 | |
Number of file groups per bucket | 1 | >1 |
Implementation complexity | simple | complex |
Can avoid data skew when writing | no | yes |
Good support for data growth | bad | great |
This proposal will implement pattern 1.
...