Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Instantiating a new client may result in a fatal failure if the bootstrap server cannot be resolved due to potential misconfiguration or transient network issues such as slow DNS. This is suboptimal for several reasons, including the fact that the ConfigException exception type does not accurately reflect the root cause of the problem. It would be more effective to provide a grace period for retry attempts before ultimately failing, as this would improve the client's resilience and increase the chances of successful initialization.

Proposed Changes

This KIP proposes changing the bootstrap behavior of the NetworkClient by moving the logic from the constructor to the first poll() call. This change ensures that the client doesn't fail at startup due to issues like misconfiguration or network disruptions and allows for retries upon subsequent poll() invocations. The proposed updates include introducing a new configuration option for timing out the bootstrapping process, a new exception type for handling bootstrap-related issues, and additional logging to aid in diagnosing bootstrapping failures.

...

  • Client Constructor: The constructor will only parse the bootstrap configuration.

  • NetworkClient:

    • Bootstrapping will now occur in the poll method before attempting to update the metadata. This includes resolving the addresses and bootstrapping the metadata.
    • An error message will be logged in the event of a failed bootstrap process.
    • If the timeout exceeds, a non-retriable BootstrapConnectionException will be thrown.
  • Consumer, Producer, and Admin Clients: The bootstrap code will be changed.

...

bootstrap.connection.timeout.ms

The proposed configuration specifies the maximum amount of time clients can spend trying to establish a connection to the bootstrap server and resolve its IP address. If the connection cannot be established and resolved within this time, a BootstrapConnectionException will be thrown.

Note that the default value for this configuration option is open for discussion. It can be set to 0, which is the same as the current behavior of exiting upon the first failure.

Type:long
Default:300000 (5 minutes)
Valid Values:0 - LONG_MAX
Importance:high

...