Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Not only the sticky assignor preserves more assignments, it also results in a more balanced assignment.

 

Sticky Partition Assignment Algorithm

The inputs to the sticky partition assignment algorithm are

  • partitionsPerTopic: topics mapped to number of their partitions
  • subscriptions: mapping of consumer to subscribed topics
  • currentAssignment: preserved assignment of topic partitions to consumers calculated during the previous rebalance

The sticky partition assignment algorithm works by defining and maintaining a number of data structures

  • potential consumers of each topic partition (partition2AllPotentialConsumers)
  • topic partitions each consumer can consume from (consumer2AllPotentialPartitions)
  • the current consumer of each topic partition (currentPartitionConsumer)
  • sorted list of topic partitions (in ascending order) by how many consumers can consumer from them (sortedAllPartitions)
  • partitions that don't have a consumer (either because they are just created, or because their consumer no longer exists) and need new assignment (unassignedPartitions)
  • sorted list of consumers (in ascending order) based on how many partitions are currently assigned to them (sortedCurrentSubscriptions)
  • re-assignable partitions, or those with 2 or more potential consumers (reassignablePartitions)
  • partitions that cannot be assigned to another consumer (fixedPartitions)
  • current assignment of (fixed) consumers whose assignment cannot change (fixedAssignments)

 

The algorithm includes these main steps

  1. Update currentAssignment by removing consumers or partitions that no longer exist.
  2. Distribute unassignedPartitions among existing consumers: For each unassigned partition start from the consumer with fewest number of assignments and if the consumer can consume from this partition assign the partition to the consumer and update the necessary data structures. After this distribution all partitions that can be assigned to some consumer are, in fact, assigned. There maybe some that have no subscriber and remain unassigned (which is fine).
  3. Find consumers whose topic partition assignment cannot change and remove them from currentAssignment (to limit the scope of the problem).
  4. From partitions that can potentially be reassigned (sortedAllPartitions) start from one with fewest possible consumers and if it can go to another consumer to improve balance move it to that other consumer. Continue until all partitions are gone through this reassignment or we reach a balance (defined below).
  5. If reassignments occurred in previous step and we don't reach a better overall balance (for example, we get an equally balanced assignment) we revert the assignment in favor of stickiness since we could not improve fairness.
  6. Re-add fixed assignments to currentAssignment.

 

We call a consumer partition assignment balanced when

  • The minimum and maximum number of partitions assigned to any consumer differs by at most one; or
  • No topic partition can be moved from its current consumer to another of its potential consumers and improve the overall balance score at the same time.
    The overall balance score of a topic partition assignment is defined as the sum of assigned partitions size difference of all consumer pairs. A perfectly balanced assignment (with all consumers getting the same number of partitions) has a balance score of 0. Lower balance score indicates a more balanced assignment.

 

Note that due to the complex nature of the problem and the heuristics applied in the algorithm above we expect to find an optimum (and not necessarily the best) sticky partition assignment.

 

Alternatives

The sticky partition assignment algorithm described above, as mentioned earlier in this KIP, favors fairness over stickiness (we may call it fair yet sticky or stickiest fair partition assignment). Therefore, some partitions may change their consumer towards a fair assignment. This alternative is supposedly more complex to implement due to complex nature of calculating the most balanced assignment.

An alternative is to implement it so that stickiness take precedence and all previously assigned partitions are preserved; while the new consumers or partitions that need to be assigned do so towards the fairest possible assignment. We may call this alternative sticky yet fair or fairest sticky partition assignment. This one would be more efficient as it exhibits less complexity and variance. After all, the previous valid assignments remain as they are and only a few new assignments have to be calculated.

 

Compatibility, Deprecation, and Migration Plan

...