You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

Status

Current state: Discuss

Discussion thread: here

JIRA: Unable to render Jira issues macro, execution error.

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently, the socket connection timeout is depending on Linux kernel setting tcp_syn_retries. The timeout value is 2 ^ (tcp_sync_retries + 1) - 1 seconds. For the reasons below, we want to control the client-side socket timeout directly using configuration files. 

  1. The default value of Linux  tcp_syn_retries is 6. It means the default timeout value is 127 seconds and too long in some scenarios. 
  2. Currently, the leastLoadedNode() provides a cached node with the criteria below. 

    1. Provide the connected node with least number of inflight requests
    2. If no connected node exists, provide the connecting node with the largest index in the cached list of nodes.
    3. If no connected or connecting node exists, provide the disconnected node which respects the reconnect backoff with the largest index in the cached list of nodes.

A node will remain the "connecting" status until the system timeout and close the socket, even if the requests binding to this node timed out. So the leastLoadedNode() might keep providing this same node and other nodes won't get a chance to process any requests. For example, when the user specifies a list of N bootstrap-servers and no connection has been built between the client and the servers, the least loaded node provider will poll all the server nodes specified by the user. If M servers in the bootstrap-servers list are offline, the client may take (127 * M) seconds to connect to the cluster. In the worst case when M = N - 1, the wait time can be several minutes.

Public Interfaces

We propose a new common client config

connections.setup.timeout.ms: The configuration controls the maximum amount of time the client will wait for the initial socket connection to be built. If the connection is not built before the timeout elapses the network client will close the socket channel. The default value will be 10 seconds.


Proposed Changes

The new config will be a common client config. The NetworkClient will keep the config as a property.

I'm proposing to do a lazy socket connection time out. That is, the NetworkClient will only check and disconnect timeout connections in leastLoadedNode(). 

  1. NetworkClient only cares about timing out the connecting node when it needs to send new requests. 
  2. NodeProviders other than LeastLoadedNodeProvider are specifying which node to connect. The connection status changes should be done at the upper level. 

The node providing criteria 3 in the least LoadedNode() will also change since

  1. Provide the connected node with least number of inflight requests
  2. If no connected node exists, provide the connecting node with the largest index in the cached list of nodes.
  3. If no connected or connecting node exists, provide the disconnected node which respects the reconnect backoff with the least number of failed attempts. Consider the case when we have multiple DISCONNECTED nodes and the time interval between the two provide() invokes is greater than reconnect.backoff.ms. The Provider can provide the same nodes all the time. Thus, the provider should provide the nodes with the least failed attempts among all nodes passing the canConnect() check.

Compatibility, Deprecation, and Migration Plan

No impact

Rejected Alternatives

  1. Use request.timeout.ms to time out the socket connection at the client level instead of the network client level
    1. request.timeout.ms is at the client/request level. We need one in the NetworkClient level to control the connection states.
    2. The socket connection timeout should be relatively shorter than the request timeout. It's good to have a separate config.
  2. Add a new connection state TIMEOUT besides DISCONNECTED, CONNECTING, CHECKING_API_VERSIONS, READY, and AUTHENTICATION_FAILED
    1. We don't necessarily need to differentiate the timeout and disconnected states.


  • No labels