Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I'm proposing to do a lazy socket connection time out. That is, the NetworkClient will only check and disconnect timeout connections in leastLoadedNode(). 

  1. Usually, when clients send a request, they will ask the network client to send the request to a specific node. In these cases, the connection.setup.timeout won’t matter too much because the client doesn’t want to try other nodes for that specific request. The request level timeout would be enough. The metadata fetcher fetches the status of the nodes periodically so the clients will reassign the timeout request correspondingly to a different node.
  2. Consumer, producer, and AdminClient are all using leastLoadedNode() for metadata fetches, where the connection setup timeout can play an important role. Unlike other requests can refer to the metadata for node condition, the metadata requests can only blindly choose a node for retry in the worst scenario. We want to make sure the client can get the metadata smoothly and as soon as possible. As a result, we need this connection.setup.timeout.
  3. Implementing the timeout in NetworkClient.poll() or anywhere else might need an extra iteration of all nodes, which might downgrade the network client performance.
  4. NetworkClient only cares about timing out the connecting node when it needs to send new requestsNodeProviders other than LeastLoadedNodeProvider are specifying which node to connect. The connection status changes should be done at the upper level

The node providing criteria 3 in the least LoadedNode() will also change since

...