...
To summarize, it is difficult with the current API to tune the poll loop to avoid unexpected rebalances caused by processing overhead. In this KIP, we propose to solve this problem by giving users more control over the number of records returned in each call to poll(). In particular, we add a second argument to poll(), which controls the maximum number of records which will be returned. If the consumer fetches more records than the maximum provided in poll(), then it will keep the additional records until the next call to poll(). Below we describe the proposed API and provide a detailed design for its implementationa way to limit the number of messages returned by the poll() call.
Public Interfaces
Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.
...
Binary log format
The network protocol and api behavior
Any class in the public packages under clientsConfiguration, especially client configuration
org/apache/kafka/common/serialization
org/apache/kafka/common
org/apache/kafka/common/errors
org/apache/kafka/clients/producer
org/apache/kafka/clients/consumer (eventually, once stable)
Monitoring
Command line tools and arguments
- Anything else that will likely break existing users in some way when they upgrade
Proposed Changes
In this KIP, we propose to solve this problem by giving users more control over the number of records returned in each call to poll(). In particular, we add a configuration parameter, max.poll.messages, which can be set on the configuration map given to KafkaConsumer constructor. If the consumer fetches more records than the maximum provided in max.poll.messages, then it will keep the additional records until the next call to poll(). Below we describe the proposed API and provide a detailed design for its implementation.
Clearly the new configuration option will also need to be added to the documentationDescribe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
- If we are changing behavior how will we phase out the older behavior?
- If we need special migration tools, describe them here.
- When will we remove the existing behavior?
Rejected Alternatives
There will be no impact on existing users.
Rejected Alternatives
An alternative way of limitting the number of messages would be to introduce a second argument to poll(), which controls the maximum number of records which will be returned. It would have the following advantages:
- It would be more explicit. A counterargument is that there are many configuration options already. Avoiding yet another would not make that much difference.
- It would allow the poll loop to dynamically change the max number of messages returned. However, this sounds like a non-requirement.
However, it would also have the following disadvantages:
- It would introduce yet another public method to would have to be maintained for a long time in the future.
- Introducing the max.poll.messages approach would allow us to make proper allocations at start-up instead of dynamically doing special logic in every poll() call.
It should also be mentioned that introducing a configuration parameter will not hinder us from introducing a second parameter to poll() in the future.
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.