Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Instead of switching parititions on every batch creation, switch partitions every time partitioner.sticky.batch.size bytes got produced to partition.  Say we're producing to partition 1.  After 16KB got produced to partition 1, we switch to partition 42.  After 16KB got produced to partition 42, we switch to partition 3.  And so on.  We do it regardless of what happens with batching or etc. just count the bytes produced to a partition.  This way the distribution would be both uniform (barring restartsthere could be small temporary imbalance) and sticky even if linger.ms=0 because more consecutive records are directed to a partition, allowing it to create better batches.

...

The batching will continue until either an in-flight batch completes or we hit the N partitioner.sticky.batch.size bytes and move to the next partition.  This way it takes just 5 records to get to batching mode, not 5 x number of partition records, and the batching mode will stay longer as we'll be batching while waiting for a request to be completedstart batching.  This happens because once we have 5 in-flight, the new batch won't be sent out immediately until at least on in-flight batch and keeps accumulating records.  With the current solution, it takes 5 x number of partitions to have enough batches in-flight so that new batch won't be sent immediately.  As the production rate accelerates, the logic will automatically switch to use larger batches more records could be accumulated while 5 batches are already in-flight, thus larger batches are going to be used for higher production rates to sustain higher throughput.

...