Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently the Kafka clients resolve a symbolic hostname (in - in org.apache.kafka.clients.NetworkClient.initiateConnect() )usingresolve a symbolic hostname using :

Code Block
languagejava
 new InetSocketAddress(String hostname, int port)

which only picks one IP address even if the DNS has multiple records for the hostname, as it in turn calls  :

Code Block
languagejava
InetAddress.getAllByName(hosthostname)[0]

...


For some environments where the broker hostnames are mapped by the DNS to multiple IPs, e.g. in clouds where the IPs point to the external load balancers, it is desirable that the client, on failing to connect to one of the IPs, would try the other ones before giving up the connection.

Our use case is for multiple load balancers fronting the Kafka cluster. The Kafka advertised listeners advertise hostnames that the DNS server maps to the IP of all LBs not just one.
If one LBs isn't available, but the client is able to use another IP for the same hostname (it connects to the 2nd LB for example) the service stays available.

Another case would be where brokers are fronted by two proxies in active/standby mode. This KIP would enable using a standby proxy for HA if connection to the active proxy fails.

Although this KIP and KIP-235 both deal with multiple DNS records, they address separate concerns.

Public Interfaces

Client configuration

Introduce a new configuration parameter :

useenable.all.dns.ips = true / false

...

If the configuration parameter useenable.all.dns.ips is set to true, the network client code will use

Code Block
languagejava
InetAddress.getAllByName(hostname)

to find all IPs and iterates over them when they fail to connect, until they are exhaustedobtain from the DNS server all IPs for the broker hostnames.

The NIO client will attempt a connection to one of the IPs. If the connection is refused or times out and the DNS had returned multiple IPs, then rather than failing the connection it will try to connect to the other IPs. 

If they all fail to connect then the behavior is like the current client when the only obtained IP had failed to connect:

  • at bootstrap, move to the next hostname
  • past bootstrap, retry the connection to the given node starting from the resolution of the hostname

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
    None with the default configuration By default enable.all.dns.ips will be disabled so there will be no impact

Rejected Alternatives

Making the client use all IPs by default default as it may have impacted some users