Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently, the initial socket connection timeout is depending on Linux kernel setting tcp_syn_retries. The timeout value is 2 ^ (tcp_sync_retries + 1) - 1 seconds. For the reasons below, we want to control the client-side socket timeout directly using configuration files. 

  1. The default value of Linux  tcp_syn_retries is 6. It means the default timeout value is 127 seconds and too long in some scenarios. For example, when the user specifies a list of N bootstrap-servers and no connection has been built between the client and the servers, the least loaded node provider will poll all the server nodes specified by the user. If M servers in the bootstrap-servers list are offline, the client may take (127 * M) seconds to connect to the cluster. In the worst case when M = N - 1, the wait time can be several minutes.
  2. Currently, the leastLoadedNode() provides a cached node with the criteria below. 

  3. Though we may set the default value of tcp_syn_retries smaller, we will then change the system level network behaviors, which might cause other issues.
  4. Applications depending on KafkaAdminClient may want to robustly know and control the initial socket connect timeout, which can help throw corresponding exceptions in their layer.
    1. Provide the connected node with least number of inflight requests
    2. If no connected node exists, provide the connecting node with the largest index in the cached list of nodes.
    3. If no connected or connecting node exists, provide the disconnected node which respects the reconnect backoff with the largest index in the cached list of nodes.

    If we do not 

Public Interfaces

We propose a new common client config

...

  1. This function iterates over all cached nodes and provides a node for the AdminClient to send the request. We can add our timeout checking logic inside the iteration, which won't downgrade the performance. If the connection establishment timeout hits, the connection state will change to DISCONNECTED
  2. Currently, when no active connection exists, the provider will provide the node with the largest index in the node list by qualifying nodes using canConnect(), which is checking if the connection state is DISCONNECTED and if the reconnect backoff is meet. This is not appropriate because we need to poll every node (probably round-robin). Consider the case when we have multiple DISCONNECTED nodes and the time interval between the two provide() invokes is greater than reconnect.backoff.ms. The Provider can provide the same nodes all the time. Thus, the provider should provide the nodes with the least failed attempts among all nodes passing the canConnect() check.

...