Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state: Under discussion.

Discussion thread: here  (Not happening yet)

JIRA: here

Motivation

Currently, clients would fail if they cannot bootstrap upon starting, and one common reason is DNS lookup failureDNS resolution fails. The application owner will either need to implement retry logic or manually restart the application. This is inconvenient and hard to handle because:

  1. Bootstrap failure Failed DNS resolution throws a ConfigException, which does is not indicate descriptive of the actual problem (the message is fine, but the exception type is misleading as this might not be a config problem); unless the developer tries to parse and match the error message.

  2. It can take minutes before the bootstrap server is registered to the DNS server, and it is reasonable to allow clients to continue to retry.

Public Interfaces

  • Users will no longer get ConfigException upon failed DNS resolution.can catch a NetworkException and retry. (Remove the ConfigException)
  • Several logging (warn) around failing DNS resolution will be removed.
  • DNS lookup will happen on the first poll.

Proposed Changes

Move Remove the DNS resolution and metadata bootstrap logic lookup in the client constructor and delegate this task to the NetworkClient poll method, which means bootstrapping only happens upon invoking the pollthe client's won't attempt to resolve for DNS upon starting.

Changes

  • Client Constructor: Only parse the bootstrap config and validate its format there

  • NetworkClient:

    • Bootstrapping should now occur in the poll method before attempting to update the metadata. This includes resolving the addresses and bootstrapping the metadata.

    • Throws and logs a NetworkException if DNS resolution fails.

Compatibility, Deprecation, and Migration Plan

  • Client Behaviors

    • Clients won’t attempt to resolve the bootstrap addresses upon initialization.

    • Clients won’t exit fatally if DNS resolution fails.

    • KafkaConsumer: Users will need to poll to retry the lookup if it failed.

    • KafkaAdminClient: Users will need to resend the request if failing.

    • KafkaProducer: The sender loop should already be polling continuously.

    Exception Handling

    • Failed DNS resolution will result in NetworkException

Test Plan

  1. NetworkClient

    1. Test bootstrap DNS resolution upon its initial poll

    2. Test if the right exception type is thrown

  2. Existing clients (Consumer, Producer, AdminClient)

    1. Test successful bootstrapping upon retrying

Rejected Alternatives

  1. Allow the application owner to specify a retry period. The clients will fail after exceeding the timeout. The default set to 0s, which makes retry an opt-in config.

    1. Pros: Allows users to have more control over how long to retry

    2. Cons: Require a new config; client instantiation can block.

  2. No retry. Let the application owner handle the DNS resolution exception. This means we would still throw a DNSLookupException upon failing.

    1. Pros: No additional config is needed

    2. Cons: This is a behavioral change, and the application owner might need to rewrite the exception handling, i.e. catching the DNS failure logic.