Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Remove the DNS lookup in the client constructor and delegate this task to the NetworkClient poll method, which means the client's clients won't attempt to resolve for DNS upon starting.

...

  • Client Constructor: Only parse the bootstrap config and validate its format there

  • Bootstrap Connection Timeout: A timeout configuration for connecting to the bootstrap server.
  • NetworkClient:

    • Bootstrapping should now occur in the poll method before attempting to update the metadata. This includes resolving the addresses and bootstrapping the metadata.Throws and logs a NetworkException if DNS resolution fails

    • Logs an error message with failed bootstrap process

    • If the timeout exceeds, throw a BootstrapConnectionException, which is nonretriable 

Configuration Change

bootstrap.connection.timeout.ms

The amount of time clients can try to establish a connection to the bootstrap server and resolve for the IP address. If the time exceeds this value, a BootstrapConnectionException will be thrown.

Note: the default value is up for discussion. It can be 0, which is the same as the current behavior.  Exit upon the first retry.

Type:long
Default:300000 (5 minutes)
Valid Values:0 - LONG_MAX
Importance:high

New Error Type

Name: BootstrapConnectionException extends KafkaException

Message: "Unable to establish a connection to the bootstrap server in {}ms."

Type: Non-retriable.

Compatibility, Deprecation, and Migration Plan

  • Client Behaviors

    • Clients won’t attempt to resolve the bootstrap addresses upon initialization.

    • Clients won’t exit fatally if DNS resolution fails.

    • KafkaConsumer: Users will need to must poll to retry the lookup if it failedfails.

    • KafkaAdminClient: Users will need to resend the request if failing.

    • KafkaProducer: The sender loop should already be polling continuously.

    Exception Handling

    • Failed DNS resolution will result in NetworkException

Case Study

KafkaConsumer

Case 1: Unable to connect to the bootstrap (For example: misconfiguration)

  1. The user tries to create a consumer with an invalid bootstrap address.  The client was instantiated without an error.
  2. The user invokes assign() and starts poll()
  3. The poll returns an empty Consumer Record and logs an error
  4. The user continues to retry for the configured duration
  5. The client throws BootstrapConnectionException

Case 2: Transient Network Issue (For example: transient DNS failure)

  1. The user tries to create a consumer with an invalid bootstrap address.  The client was instantiated without an error.
  2. The user invokes assign() and starts poll()
  3. The poll returns an empty Consumer Record and logs an error
  4. The user continues to retry, and the address successfully resolved after x ms
  5. The poll returns some ConsumerRecrods.

KafkaProducer

Case 1: Unable to connect to the bootstrap (For example: misconfiguration)

  1. The user instantiates a new producer; the producer creates and starts a sender thread.
  2. Upon the first runOnce(), client.poll was invoked, which attempts to bootstrap the client. Failure was logged.
  3. The user tries to produce a message, but because of misconfiguration, the send() will continue to fail on waitOnMetadata until the bootstrap timeout expires
  4. Once the bootstrap timeout expires, a BootstrapConnectionException is thrown.

Case 2: Transient Network Issue (For example: transient DNS failure)

  1. The user creates a producer, the sender loop continues to try to connect to the bootstrap server.
  2. Several send() calls fails with TimeoutException (waitOnMetadata)
  3. The user retries, eventually successfully send

AdminClient

Case 1: Unable to connect to the bootstrap (For example: misconfiguration)

  1. The user instantiates a new admin client.
  2. The AdminClientRunnable thread invokes NetworkClient.poll.  The client cannot connect to the bootstrap server; however, the loop continues to poll the NetworkClient.
  3. Meanwhile, users can make 
  4. Eventually, bootstrap timeout is exhausted.  BootstrapConnectionException is thrown.

Case 2: Transient Network Issue (For example: transient DNS failure)

  1. The user instantiates a new admin client.
  2. The AdminClientRunnable thread invokes NetworkClient.poll.  The client cannot connect to the bootstrap server; however, the loop continues to poll the NetworkClient.

Test Plan

  1. NetworkClient

    1. Test DNS resolution upon its initial poll

    2. Test if the right exception type is thrown

  2. Existing clients (Consumer, Producer, AdminClient)

    1. Test successful bootstrapping upon retrying

...