Current state: Under Discussion
Discussion thread: here
JIRA: Kafka-14768
Motivation
Sometimes, application's threads will block for max.block.ms to send records using KafkaProducer#send. It exhausted threads of whole system for the time in some cases.
When application try to reduce the max.block.ms to decrease the blocking time. Thus, they They will find they couldn't change the value to any one which is smaller than the time costed for metadata's fetch. What's more, metadata's fetch is one heavy operation which cost a lot of time.
...
What's more, the metadata's fetch only need to be done one time in whole blocking of KafkaProducer#send. After the complete of first fetch, the metadata will be retrieved from cache directly and its timer update only happen on network thread instead of user's thread.
So, this KIP try to reach the goal we can change reduce the blocking time by changing the max.block.ms to wanted smaller value without worry worrying about the metadata's fetch.
Public Interfaces
No public interface changed. Just change the inner implement of private method:
...
Add two new configure items for producer.
The changes can refer to the example PR: https://github.com/apache/kafka/pull/13335/files
...
Producer's configure. | configure item. | default value
|
includeWaitTimeOnMetadataInMaxBlockTime | max.block.ms.include.metadata | false |
maxWaitTimeMsOnMetadata | max.block.metadata.ms<max.block.ms | 60 seconds |
2. Code changes
By default, includeWaitTimeOnMetadataInMaxBlockTime is true, all of the behaviors are not changed.
When user set includeWaitTimeOnMetadataInMaxBlockTime to false, KafkaProducer#send will block maxWaitTimeMsOnMetadata for metadata's fetch and block max.block.ms for remaining operations.
If user want to use the feature, user can upgrade the client with the new configures set.
...
- What impact (if any) will there be on existing users?
no impact on existed users. - If we are changing behavior how will we phase out the older behavior?
no changing older behavior. - If we need special migration tools, describe them here.
no. - When will we remove the existing behavior?
no need to remove.
We can test with test matrix:
if we need N (2<N<5) seconds for metadata's fetch, we will send record to test producer with different configures.
Cases to send record.\Configures | max.block.ms | includeWaitTimeOnMetadataInMaxBlockTime(max.block.ms.include.metadata) | maxWaitTimeMsOnMetadata(max.block.metadata.ms) |
case 1 success | 10 seconds | default value: false (no set)false | default value: 60 seconds (no set) |
case 2 fail to send | 1 seconds | default value: false (no set)false | default value: 60 seconds (no set) |
case 3 success | 10 seconds | true | default value: 60 seconds (no set) |
case 4 success | 1 seconds | true | 5 seconds |
case 5 fail to send | 1 seconds | true | 1 seconds |
Case 2 and case 5 will fail to send records. All of others are success.
Rejected Alternatives
Alternative 1:
Provide new method to complete the metadata fetch not controlled by max.block.ms and user should call it before sending any record. For example, user can call it before marking the service ready.
...