Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Ideally, we would include records from each partition proportionally according to the number of assigned partitions. For example, if two partitions are assigned and suppose that the consumer has been assigned partitions A, B, and C. If max.poll.records is set to 100300, then we would return 50 100 records from each partition. However, this is subject to the availability of the fetched data. If records are only available from one of the partitionsA and B, then we should would expect to return 100 150 records from that partition each of them and none from the otherC. In general, we can't make any guarantees about actual message delivery (Kafka itself does not return data proportionally in a fetch response for each requested partition), but we should give each partition a fair chance for consumption.

One way to achieve this would be to pull records from each partition in a round-robin fashion. If partitions A, B, and C were assigned and max records was set to 30, Using the same example as above, we would first try to pull 10 100 messages from A, then 10 100 from B, and so on. We would have to keep track of which partition we left off at and begin there on the next call to avoid the starvation problem mentioned. This may require multiple passes over the partitions since not all of them will have data available. If no records were available from C, for example, then we would return to A. Alternatively, we could return after one pass with less records than are actually available, but this seems less than ideal. 

A simpler approach would be is to pull as many records as possible from each partition in a similar round-robin fashion. Using In the same example, we would first try to pull 30 all 300 records from A. If there was still space available, we would try to then pull whatever was left from B and so on. As before, we'd keep track of where we left off. This would not achieve the ideal balancing described above, but it still ensures that each partition gets consumed and requires only a single pass over the fetched records. We favor this approach for simplicity.

...