Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: removed proposed change from motivation, propose config flag and add second poll parameter to rejected alternatives

...

To summarize, it is difficult with the current API to tune the poll loop to avoid unexpected rebalances caused by processing overhead.   In this KIP, we propose to solve this problem by giving users more control over the number of records returned in each call to poll(). In particular, we add a second argument to poll(), which controls the maximum number of records which will be returned. If the consumer fetches more records than the maximum provided in poll(), then it will keep the additional records until the next call to poll(). Below we describe the proposed API and provide a detailed design for its implementationa way to limit the number of messages returned by the poll() call.

Public Interfaces

Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.

...

  • Binary log format

  • The network protocol and api behavior

  • Any class in the public packages under clientsConfiguration, especially client configuration

    • org/apache/kafka/common/serialization

    • org/apache/kafka/common

    • org/apache/kafka/common/errors

    • org/apache/kafka/clients/producer

    • org/apache/kafka/clients/consumer (eventually, once stable)

  • Monitoring

  • Command line tools and arguments

  • Anything else that will likely break existing users in some way when they upgrade

Proposed Changes

In this KIP, we propose to solve this problem by giving users more control over the number of records returned in each call to poll(). In particular, we add a configuration parameter, max.poll.messages, which can be set on the configuration map given to KafkaConsumer constructor. If the consumer fetches more records than the maximum provided in max.poll.messages, then it will keep the additional records until the next call to poll(). Below we describe the proposed API and provide a detailed design for its implementation.

Clearly the new configuration option will also need to be added to the documentationDescribe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior how will we phase out the older behavior?
  • If we need special migration tools, describe them here.
  • When will we remove the existing behavior?

Rejected Alternatives

There will be no impact on existing users.

Rejected Alternatives

An alternative way of limitting the number of messages would be to introduce a second argument to poll(), which controls the maximum number of records which will be returned. It would have the following advantages:

  • It would be more explicit. A counterargument is that there are many configuration options already. Avoiding yet another would not make that much difference.
  • It would allow the poll loop to dynamically change the max number of messages returned. However, this sounds like a non-requirement.

However, it would also have the following disadvantages:

  • It would introduce yet another public method to would have to be maintained for a long time in the future.
  • Introducing the max.poll.messages approach would allow us to make proper allocations at start-up instead of dynamically doing special logic in every poll() call.

It should also be mentioned that introducing a configuration parameter will not hinder us from introducing a second parameter to poll() in the future.

 

 If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.