Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A node will remain the "connecting" status until 2 ^ (tcp_sync_retries + 1) - 1 seconds elapsed, even if the requests binding to this node timed out. So the leastLoadedNode() might keep providing this same node and other nodes won't get a chance to process any requests. For example, when the user specifies a list of N bootstrap-servers and no connection has been built between the client and the servers, the least loaded node provider will poll all the server nodes specified by the user. If M servers in the bootstrap-servers list are offline, the client may take (127 * M) seconds to connect to the cluster. In the worst case when M = N - 1, the wait time can be several minutes.

Considering the potential approval of KIP-612 which proposes to throttle connection setup, we propose an exponential connection setup timeout to help the NetworkClient

  1. Detect the 

Public Interfaces

We propose two new common client configs

...

socket.connections.setup.timeout.max.ms: The maximum amount of time the client will wait for the initial socket connection to be built. If provided, the connection setup timeout will increase exponentially for each consecutive connection failure, up to this maximum. To avoid connection storms, a randomization factor of 0.2 will be applied to the backoff resulting in a random range between 20% below and 20% above the computed value. Note that the maximum connection setup is dominated by the configuration of request timeout.

The formula to calculate the latest connection setup timeout is as follows, where the random factor is to prevent connection storms:

...