Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Discussion thread: here

JIRA: here

The proposal discussed in this KIP is implemented in this pull request.

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Several new behaviors for handling and reporting errors are introduced, and all must be configured in the individual connector configurations.

Retry on Failure

Retry Connect will attempt to retry the failed operation for a configurable number of times, total duration, starting with a fixed duration (value of 300ms) and with exponential backoff between each retry, based on a fixed starting delay value. The number of retries and backoff can be configured using the following new properties:

Configuration NameDescriptionDefault ValueDomain
errors.retry.limittimeoutThe total duration a failed operation will be retried forThe maximum number of retries before failing.0[-1, 0, 1, ... Long.MAX_VALUE], where -1 means infinite retriesduration.
errors.retry.delay.max.msThe maximum duration between two consecutive retries (in milliseconds). Jitter will be added to the once this duration is reached to prevent any thundering herd issues.60000[1, ... Long.MAX_VALUE]

...

[-1, 0, 1, ... Long.MAX_VALUE], where a value of -1 means infinite failures will be tolerated.
Config OptionDescriptionDefault ValueDomainerrors.tolerance.limitFail the task if we exceed specified number of errors overall.0
errors.tolerance.rate.limitFail the task if we exceed specified number of errors in the observed duration.0[-1, 0, 1, ... Long.MAX_VALUE], where a value of -1 means infinite failures will be tolerated in the observed window.
errors.tolerance.rate.durationThe duration of the window for which we will monitor errors.minuteminute, hour, day

...

For sink connectors, we will write the original record (from the Kafka topic the sink connector is consuming from) which caused the failure to a configurable Kafka topic.

Config OptionDescriptionDefault ValueDomain
errors.deadletterqueue.topic.nameThe name of the dead letter queue topic. If not set, this feature will be disabled.""A valid Kafka topic name

Metrics

The following new metrics will monitor the number of failures, and the behavior of the response handler. Specifically, the following set of counters:

...

Code Block
languagejava
# disable retries on failure
errors.retriesretry.limittimeout=0

# do not log the error and their contexts
errors.log.enable=false

# do not record errors in a dead letter queue topic
errors.dlq.enable=false

# Fail on first failure
errors.tolerance.rate.limit=0

Example 2: Record and Skip

...

Code Block
languagejava
# retry for at most 10 100minutes times waiting up to 530 minutesseconds between consecutive failures
errors.retriesretry.limittimeout=100600000
errors.retriesretry.delay.max.ms=30000030000

# log error context along with application logs, but do not include configs and messages
errors.log.enable=true
errors.log.include.configs=false
errors.log.include.messages=false

# produce error context into the Kafka topic
errors.dlqdeadletterqueue.topic.name=my-connector-errors

# Tolerate all errors.
errors.tolerance.limit=-1
errors.tolerance.rate.limit=-1
errors.tolerance.rate.duration.ms=60000

...