You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

 RFC - 20 : handle failed records

Proposers

Approvers

  • @<approver1 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
  • @<approver2 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
  • ...

Status

Current state


Current State

UNDER DISCUSSION

(tick)

IN PROGRESS


ABANDONED


COMPLETED


INACTIVE


Discussion thread: here

JIRA: Unable to render Jira issues macro, execution error.

Released: 0.6.x

Abstract

Present a proposal for handling failed records in writer path.

Background

To handle failed records properly to facilitate investigation.

Implementation

Error record

Define an avro schema for error records

error record schema
{
  "type": "record",
  "namespace": "org.apache.hudi.common",
  "name": "ErrorRecord",
  "fields": [
    {
      "name": "uid",
      "type": "string"
    },
    {
      "name": "ts",
      "type": "string"
    },
    {
      "name": "schema",
      "type": ["null", "string"],
      "default": null
    },
    {
      "name": "record",
      "type": ["null", "string"],
      "default": null
    },
    {
      "name": "message",
      "type": ["null", "string"],
      "default": null
    }
}
  • `uid`: uuid for the error record
  • `ts`: creation unix timestamp for the error record
  • `schema`: original schema for the record if any
  • `record`: original serialized record in json if any
  • `message`: additional message

Errors table

  • Maintain an internal Hudi table named `errors` within `.hoodie/` directory
  • Partition based on `ts` field in daily interval

Write path

Start with

  • org.apache.hudi.client.HoodieWriteClient#postWrite

  • org.apache.hudi.client.HoodieWriteClient#completeCompaction

CLI support

  • Consider adding CLI support for easy inspection

Metrics

  • Emit a count metric for the number of failed records

Rollout/Adoption Plan

  • Add a writer config to turn on this handling `hoodie.write.handle.failed.records`
  • Default to false for smooth roll-out

Test Plan

TODO







  • No labels