Proposers
Approvers
- Vinoth Chandar : [APPROVED/REQUESTED_INFO/REJECTED]
- ...
Status
Current state:
Current State | |
---|---|
UNDER DISCUSSION | |
IN PROGRESS | |
ABANDONED | |
COMPLETED | |
INACTIVE |
Discussion thread: here
JIRA:
Released: 0.6.x
Abstract
Present a proposal for handling failed records in writer path.
Background
To handle failed records properly to facilitate investigation.
Implementation
Error record
Define an avro schema for error records
{ "type": "record", "namespace": "org.apache.hudi.common", "name": "ErrorRecord", "fields": [ { "name": "uid", "type": "string" }, { "name": "ts", "type": "string" }, { "name": "schema", "type": ["null", "string"], "default": null }, { "name": "record", "type": ["null", "string"], "default": null }, { "name": "message", "type": ["null", "string"], "default": null } }
- `uid`: uuid for the error record
- `ts`: creation unix timestamp for the error record
- `schema`: original schema for the record if any
- `record`: original serialized record in json if any
- `message`: additional message
Errors table
Users can choose to use local error tables or a global one, depending on their preferences.
Local error table
Users may apply `hoodie.write.error.table.global=false` (default) to Hudi writers such that the failed records will be written to a local Hudi table alongside with the target table.
Global error table
Users may apply `hoodie.write.error.table.global=true` to Hudi writers such that the failed records will be written to a global Hudi table.
Configurations
key | default | |
---|---|---|
hoodie.write.error.table.enabled | set to true to activate error table handling feature | false |
hoodie.write.error.table.suffix | suffix for local error table name, stored alongside the target table. If the Hudi table is "foo", errored records will be saved to "foo_errors" at the same base dir as configured via `hoodie.base.path` | "_errors" |
hoodie.write.error.table.global | set to true to use global errors table | false |
hoodie.write.error.table.global.base.path | base path for global error table | same as `hoodie.base.path` |
Write path
Start with
org.apache.hudi.client.HoodieWriteClient#postWrite
org.apache.hudi.client.HoodieWriteClient#completeCompaction
CLI support
- Consider adding CLI support for easy inspection
Metrics
- Emit a count metric for the number of failed records
Rollout/Adoption Plan
- Use configuration turn on this feature `hoodie.write.error.table.enabled=true`
- Default to false for smooth roll-out
Test Plan
- Functional test cases to cover both local and global cases.