Proposers
Approvers
- @<approver1 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
- @<approver2 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
- ...
Status
Current state:
Current State | |
---|---|
UNDER DISCUSSION | |
IN PROGRESS | |
ABANDONED | |
COMPLETED | |
INACTIVE |
Discussion thread: here
JIRA:
Released: 0.6.x
Abstract
Present a proposal for handling failed records in writer path.
Background
To handle failed records properly to facilitate investigation.
Implementation
Error record
Define an avro schema for error records
error record schema
{ "type": "record", "namespace": "org.apache.hudi.common", "name": "ErrorRecord", "fields": [ { "name": "uid", "type": "string" }, { "name": "ts", "type": "string" }, { "name": "schema", "type": ["null", "string"], "default": null }, { "name": "record", "type": ["null", "string"], "default": null }, { "name": "message", "type": ["null", "string"], "default": null } }
- `uid`: uuid for the error record
- `ts`: creation unix timestamp for the error record
- `schema`: original schema for the record if any
- `record`: original serialized record in json if any
- `message`: additional message
Errors table
- Maintain an internal Hudi table named `errors` within `.hoodie/` directory
- Partition based on `ts` field in daily interval
Write path
Start with
org.apache.hudi.client.HoodieWriteClient#postWrite
org.apache.hudi.client.HoodieWriteClient#completeCompaction
CLI support
- Consider adding CLI support for easy inspection
Metrics
- Emit a count metric for the number of failed records
Rollout/Adoption Plan
- Add a writer config to turn on this handling `hoodie.write.handle.failed.records`
- Default to false for smooth roll-out
Test Plan
TODO