...
- associate a time stamp with each inflight requests and retry
- The poll() will go through the inflight requests and expire those requests if necessary.
- The timeout for poll() will need to consider the request timeout for inflight requests.
The request timeout will be set to a reasonable value, say 60 seconds.
Actions after request timeout
When the request timeout has been reached, we do the following:
- Refresh metadata
- disconnect and create a new TCP connection.
- Retry the request on the new TCP connection.
In most cases, metadata refresh should be able to pick up the new leader if a broker is down. If a broker was not down but just slow, as long as request timeout is set to a reasonable value, we should not see dramatic increase in TCP connections when a broker was not down but just slow.
Plan to deprecate TIMEOUT_CONFIG
To replace TIMEOUT_CONFIG with REPLICATION_TIMEOUT_CONFIG, we will do the following;
- In 0.8.2.2 or 0.8.3, we will add REPLICATION_TIMEOUT_CONFIG but leave TIMEOUT_CONFIG there. If user sets TIMEOUT_CONFIG, we show a deprecation warning and use TIMEOUT_CONFIG to override REPLICATION_TIMEOUT_CONFIG.
- In 0.9, we will remove TIMEOUT_CONFIG from the configuration.
Compatibility, Deprecation, and Migration Plan
...