Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state"Under DiscussionAccepted"

Discussion thread: part 1: here, part 2: here

JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-96789800

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

The maximum amount of time in milliseconds to wait when retrying a request to the broker that has repeatedly failed. If provided, the backoff per client will increase exponentially for each failed request, up to this maximum. To prevent all clients from being synchronized upon retry, a randomization factor of 0.2 will be applied to the backoff, resulting in a random range between 20% below and 20% above the computed value.. If retry.backoff.ms is set to be higher than retry.backoff.max.ms, then retry.backoff.max.ms will be used as a constant backoff from the
beginning without any exponential increase.

retry.backoff.max.ms would default to 1000 ms, and the starting backoff will be derived from retry.backoff.ms (which defaults to 100 ms). If This config would default to 1000 ms if retry.backoff.ms is not set explicitly. Otherwise, it will default to the same value as retry.backoff.ms. set to be greater than retry.backoff.max.ms, then retry.backoff.max.ms will be used as the backoff. Also, if retry.backoff.ms is set to be greater than retry.backoff.max.ms, then we will log a warning.  The formula to calculate the backoff is as follows:

Code Block
MIN(

...

retry.backoff.max.ms

...

Code Block
, (retry.backoff.ms * 2**(failures - 1)) * random(0.8, 1.2))

"random" in this case is the random function that will randomly factor in a "jitter" that is 20% higher or lower to the computed value. This will keep increasing until it hits the retry.backoff.max.ms value.

...

Compatibility, Deprecation, and Migration Plan

For users who have not set retry.backoff.ms explicitly, the default behavior will change so that the backoff will grow up to 1000 ms. For users who have set This KIP will not deal with changing retry backoff behavior for other Kafka clients such as Kafka Connect and Kafka Streams. However since Kafka Connect also utilizes ConsumerNetworkClient and Metadata, those aspects with respect to the Kafka Connect framework will be affected. However other places that utilize the retry.backoff.ms configuration will not be affected, notably the classes responsible for rebalancing. Therefore, it will not affect Connect worker or Consumer group rebalancing. This KIP intends to replace the old static retry backoff behavior in Kafka clients with a more dynamic retry backoff behavior. While the behavior is replaced, the configuration won't be. retry.backoff.ms explicitly, the behavior will remain the same as they could have specific requirements. will still be utilized in calculating the exponential retry backoff (as long as retry.backoff.ms < retry.backoff.max.ms) as detailed in the "Public Interfaces" section of this KIP. 

Rejected Alternatives

  1. Default retry.backoff.max.ms to the same value as retry.backoff.ms so that existing behavior is always maintained: for reasons explained in the compatibility sectionIf a user isn’t aware of this feature and doesn’t set this, they wouldn’t enjoy the exponential backoff. It’s important to ensure that we provide this as a default feature for all Kafka clients. Since we generally don't expect users to change these settings, it's important to ensure we give good defaults out of the box.
  2. Default retry.backoff.max.ms to be 1000 ms unconditionally: for reasons explained in the compatibility sectionWhile we generally don't expect users to change these settings, we still would like to be able to give the option for them to do so, and be able to tune these settings to what suits their Kafka environment.