Table of Contents

Status

Current state: Under DiscussionAccepted (vote thread)

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

JIRA: n/a here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

The sticky partitioner attempts to create this behavior in the partitioner. By “sticking” to a partition until the batch is full (or is sent when linger.ms is up), we can create larger batches and reduce latency in the system compared to the default partitioner. Even in the case where linger.ms is 0 and we send right away, we see improved batching and a decrease in latency. After sending a batch, the partition that is sticky changes. Over time, the records should be spread out evenly among all the partitions.

Netflix had a similar idea and created a sticky partitioner that selects a partition and sends all records to it for a given period of time before switching to a new partition.

...

The sticky partitioner will be part of the default partitioner, so there will not be a public interface directly.

There is a one new method exposed on the partitioner interface.

Code Block

language	java

   /**
   * Executes right before a new batch will be created. For example, if a sticky partitioner is used,
   * this method can change the chosen sticky partition for the new batch. 
   * @param topic The topic name
   * @param cluster The current cluster metadata
   * @param prevPartition The partition of the batch that was just completed
   */
  default public Integervoid onNewBatch(String topic, Cluster cluster, int prevPartition) {

  return  null;

}}

The method onNewBatch will execute code right before a new batch is created. The sticky partitioner will define this method to update the sticky partition. This includes changing the sticky partition even when there will be a new batch on a keyed value. Test results show that this change will not significantly affect latency in the keyed value case.

The default of this method The default will result in no change to the current partitioning behavior for other user-defined partitioners. If a user wants to implement a sticky partitioner in their own partitioner class, this method can be overridden.

...

Change the behavior of the default partitioner in the no explicit partition, key = null case. Choose the “sticky” partition for the given topic. The “sticky” partition is changed when the record accumulator is allocating a new batch for a topic on a given partition.

These changes will slightly modify the code path for records that have keys as well, but the changes will not affect latency significantly.

A new partitioner called UniformStickyPartitioner will be created to allow sticky partitioning on all records, even those that have non-null keys. This will mirror how the RoundRobinPartitioner uses the round robin partitioning strategy for all records, including those with keysRemove testRoundRobinWithUnavailablePartitions() and testRoundRobin() since the round robin functionality of the partitioner has been removed.

Compatibility, Deprecation, and Migration Plan

No compatibility, deprecation, migration plan required.Tests for the round robin partitioner and round robin behavior will be removed immediately.
Users can continue to use their own partitioners--if they want to implement a sticky partitioner, they can use the onNewBatch(String topic, Cluster cluster) method to implement the feature, if they don’t want to use the feature, behavior will be the same.
Existing users of the default partitioner for non-keyed, not set partitioned values should see either the same or decreased latency and cpu usage

Test Results

Testing the performance for this partitioner was done through the trogdor ProduceBench test. Modifications were made to allow for non-keyed, non-partitioned values as well as the ability to prevent batches from being flushed by the throttle. Besides waiting on linger.ms and filling the batch, a batch can be sent through a flush.Tests were done with and without flushing.

The tests were done using Castle on AWS m3.xlarge instances with SSD. Latency is measured as time to producer ack.

Tests run with these specs unless otherwise specified:

EC2 Instance	m3.xlarge
Disk Type	SSD
Duration of Test	12 minutes
Number of Brokers	3
Number of Producers	3
Replication Factor	3
Active Topics	4
Inactive Topics	1
Linger.ms	0
Acks	All
keyGenerator	{"type": "null"}

manualPartition

useConfiguredPartitioner	True
No Flushing on Throttle (skipFlush)	True

For more details, there is an example spec here.

...

Aside from latency, the sticky partitioner also sees decreased cpu utilization compared to the default code. In the cases observed, the sticky partition’s nodes often used saw up to 5-15% less reduction in CPU utilization (for example from 9-17% to 5-12.5% or from 30-40% to 15-25%) compared to the default code’s nodes.

...

More comparisons of CPU utilization can be found here.

Rejected Alternatives

Configurable sticky partitioner:

...

Space shortcuts

Child pages

Versions Compared

Old Version 1

New Version Current

Key

Status

Compatibility, Deprecation, and Migration Plan

Test Results

Rejected Alternatives

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 1

New Version Current

Key

Status

Compatibility, Deprecation, and Migration Plan

Test Results

Rejected Alternatives