Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Client Constructor: Only parse the bootstrap config and validate its format there

  • Bootstrap Connection Timeout: A timeout configuration for connecting to the bootstrap server.
  • NetworkClient:

    • Bootstrapping should now occur in the poll method before attempting to update the metadata. This includes resolving the addresses and bootstrapping the metadata.

    • Logs an error message with failed bootstrap process

    • If the timeout exceeds, throw a BootstrapConnectionException, which is nonretriable non-retriable 

Configuration Change

bootstrap.connection.timeout.ms

...

  • Client Behaviors

    • Clients won’t attempt to resolve the bootstrap addresses upon initialization.

    • Clients won’t exit fatally if DNS resolution fails.

    • KafkaConsumer: Users must poll to retry the lookup if it fails.

    • KafkaAdminClient: Users will need to resend the request if failing.

    • KafkaProducer: The sender loop should already be polling continuously.

    Exception Handling

    • Failed DNS resolution will result in NetworkException

Case Study

...

To help illustrate the proposed changes, we provide some examples of how clients might behave in different scenarios.

KafkaConsumer

Case 1: Unable to connect to the bootstrap (For example: misconfiguration)

...

Suppose the user instantiates a KafkaConsumer with an invalid bootstrap config.

...

When the user invokes assign() and starts poll()

...

, the poll() method will continue to return empty ConsumerRecords and log a warning message.

The user

...

can continue to retry for the configured duration

...

. After the bootstrap timeout expires, the client will throw a BootstrapConnectionException.

Case 2: Transient Network Issue (For example: transient DNS failure)

...

Now, suppose the user instantiates a KafkaConsumer with a valid bootstrap config, but there is a transient network issue, such as slow DNS resolution.

When the user starts poll(), the poll() method will return an empty ConsumerRecord and log a warning

...

message.

The user

...

can continue to retry, and the

...

network issue will be successfully resolved after

...

some time. The KafkaConsumer will then continue to function normally.

KafkaProducer

Case 1: Unable to connect to the bootstrap (For example: misconfiguration)

  1. The user instantiates a new producer; the producer is created and starts a sender threadinstantiated the client with an invalid bootstrap config.
  2. As the sender thread starts running, a WARN message is logged.
  3. The If the user tries to produce a message, but because of misconfiguration, the send() will fail on waitOnMetadata until the bootstrap timeout expires.messages:Depending on the configuration...
    1. The producer callback may be completed with TimeoutException until the bootstrap timeout runs out.
  4. A BootstrapConnectionException WARN message will be thrown eventuallylog every time the sender tries to bootstrap.
  5. Eventually, a BootstrapConnectionException is thrown.

Case 2: Transient Network Issue (For example: transient DNS failure)

  1. The user creates a producer; the sender loop continues to try to connect to the bootstrap server.instantiated the client.
  2. As the sender thread starts running, a WARN message is logged upon trying to bootstrap the client.
  3. Suppose the network issue is resolved before the user tries to produce a message. Only the WARN messages will be logged.
  4. If the user tries to produce a message before the issue is resolved:
    1. The sender
    A WARN message will be logged every time the networkClient tries to make a connection.
    1. Eventually, the send callback will be completed with TimeoutException if the network issue persists.
    2. The send is completed normally if the issue is resolved before exhausting the max.block.ms.

AdminClient

Case 1: Unable to connect to the bootstrap (For example: misconfiguration)

  1. The user instantiates a new admin client.
  2. The AdminClientRunnable thread invokes NetworkClient.poll.  The client cannot connect to the bootstrap server; however, the loop continues to poll the NetworkClient.
  3. Meanwhile, users can make 
  4. If the user makes admin client API calls:
    1. If the API timeout before the bootstrap timeout expires, a TimeoutException will be thrown upon invoking .get()
    2. If the bootstrap timeout expires, a BootstrapConnectionException will be thrown.
    3. Any API calls will not be completed
    Eventually, bootstrap timeout is exhausted.  BootstrapConnectionException is thrown
    1. .

Case 2: Transient Network Issue (For example: transient DNS failure)

  1. The user instantiates a new admin client.The
  2. If the API timeout before the connection is resolved, a TimeoutException will be thrown upon invoking .get() for the API call results
  3. Eventually, the API calls should go through AdminClientRunnable thread invokes NetworkClient.poll.  The client cannot connect to the bootstrap server; however, the loop continues to poll the NetworkClient.

Test Plan

  1. NetworkClient

    1. Test DNS resolution upon its initial poll

    2. Test if the right exception type is thrown

  2. Existing clients (Consumer, Producer, AdminClient)

    1. Test successful bootstrapping upon retrying

...