...
Discussion thread: here
JIRA: here
The proposal discussed in this KIP is implemented in this pull request.
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
Several new behaviors for handling and reporting errors are introduced, and all must be configured in the individual connector configurations.
Retry on Failure
Retry Connect will attempt to retry the failed operation for a configurable number of times, total duration, starting with a fixed duration (value of 300ms
) and with exponential backoff between each retry, based on a fixed starting delay value. The number of retries and backoff can be configured using the following new properties:.
Configuration Name | Description | Default Value | Domain |
---|---|---|---|
errors.retry.limittimeout | The total duration a failed operation will be retried forThe maximum number of retries before failing. | 0 | [-1, 0, 1, ... Long.MAX_VALUE], where -1 means infinite retriesduration. |
errors.retry.delay.max.ms | The maximum duration between two consecutive retries (in milliseconds). Jitter will be added to the once this duration is reached to prevent any thundering herd issues. | 60000 | [1, ... Long.MAX_VALUE] |
...
Config Option | Description | Default Value | Domain | errors.tolerance.limit | Fail the task if we exceed specified number of errors overall. | 0 | [-1, 0, 1, ... Long.MAX_VALUE], where a value of -1 means infinite failures will be tolerated.|
---|---|---|---|---|---|---|---|
errors.tolerance.rate.limit | Fail the task if we exceed specified number of errors in the observed duration. | 0 | [-1, 0, 1, ... Long.MAX_VALUE], where a value of -1 means infinite failures will be tolerated in the observed window. | ||||
errors.tolerance.rate.duration | The duration of the window for which we will monitor errors. | minute | minute, hour, day |
...
For sink connectors, we will write the original record (from the Kafka topic the sink connector is consuming from) which caused the failure to a configurable Kafka topic.
Config Option | Description | Default Value | Domain |
---|---|---|---|
errors.deadletterqueue.topic.name | The name of the dead letter queue topic. If not set, this feature will be disabled. | "" | A valid Kafka topic name |
Metrics
The following new metrics will monitor the number of failures, and the behavior of the response handler. Specifically, the following set of counters:
...
Code Block | ||
---|---|---|
| ||
# disable retries on failure errors.retriesretry.limittimeout=0 # do not log the error and their contexts errors.log.enable=false # do not record errors in a dead letter queue topic errors.dlq.enable=false # Fail on first failure errors.tolerance.rate.limit=0 |
Example 2: Record and Skip
...
Code Block | ||
---|---|---|
| ||
# retry for at most 10 100minutes times waiting up to 530 minutesseconds between consecutive failures errors.retriesretry.limittimeout=100600000 errors.retriesretry.delay.max.ms=30000030000 # log error context along with application logs, but do not include configs and messages errors.log.enable=true errors.log.include.configs=false errors.log.include.messages=false # produce error context into the Kafka topic errors.dlqdeadletterqueue.topic.name=my-connector-errors # Tolerate all errors. errors.tolerance.limit=-1 errors.tolerance.rate.limit=-1 errors.tolerance.rate.duration.ms=60000 |
...