You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

 RFC - 20 : handle failed records

Proposers

Approvers

  • @<approver1 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
  • @<approver2 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
  • ...

Status

Current state


Current State

UNDER DISCUSSION

(tick)

IN PROGRESS


ABANDONED


COMPLETED


INACTIVE


Discussion thread: here

JIRA: Unable to render Jira issues macro, execution error.

Released: 0.6.x

Abstract

Present a proposal for handling failed records in writer path.

Background

To handle failed records properly to facilitate investigation.

Implementation

Error record

Define an avro schema for error records

error record schema
{
  "type": "record",
  "namespace": "org.apache.hudi.common",
  "name": "ErrorRecord",
  "fields": [
    {
      "name": "uid",
      "type": "string"
    },
    {
      "name": "ts",
      "type": "string"
    },
    {
      "name": "schema",
      "type": ["null", "string"],
      "default": null
    },
    {
      "name": "record",
      "type": ["null", "string"],
      "default": null
    },
    {
      "name": "message",
      "type": ["null", "string"],
      "default": null
    }
}
  • `uid`: uuid for the error record
  • `ts`: creation unix timestamp for the error record
  • `schema`: original schema for the record if any
  • `record`: original serialized record in json if any
  • `message`: additional message

Errors table

Users can choose to use local error tables or a global one, depending on their preferences.

Local error table

Users may apply `hoodie.write.error.table.global=false` (default) to Hudi writers such that the failed records will be written to a local Hudi table alongside with the target table.

Global error table

Users may apply `hoodie.write.error.table.global=true` to Hudi writers such that the failed records will be written to a global Hudi table.

Configurations

key
default
hoodie.write.error.table.enabledset to true to activate error table handling featurefalse
hoodie.write.error.table.suffixsuffix for local error table name, stored alongside the target table. If the Hudi table is "foo", errored records will be saved to "foo_errors" at the same base dir as configured via `hoodie.base.path`"_errors"
hoodie.write.error.table.globalset to true to use global errors tablefalse
hoodie.write.error.table.global.base.pathbase path for global error tablesame as `hoodie.base.path`


Write path

Start with

  • org.apache.hudi.client.HoodieWriteClient#postWrite

  • org.apache.hudi.client.HoodieWriteClient#completeCompaction

CLI support

  • Consider adding CLI support for easy inspection

Metrics

  • Emit a count metric for the number of failed records

Rollout/Adoption Plan

  • Use configuration turn on this feature `hoodie.write.error.table.enabled=true`
  • Default to false for smooth roll-out

Test Plan

  • Functional test cases to cover both local and global cases.







  • No labels