Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

KIP-73 added quotas for replication but it doesn't separate normal replication traffic from reassignment. So a A user is able to specify the partition and the throttle rate but it will be applied to all non-ISR replication traffic. This is can be undesirable because if a node that is being throttled falls out of ISR it would further prevent it from catching upduring reassignment it also applies to non-reassignment replication and causes a replica to be throttled if it falls out of ISR. Also if leadership changes during reassignment, the throttles also have to be changed manually. KIP-455 will make brokers aware of pending reassignment and thus we'd be able to separate these two kinds of replication . Moreover and we won't have to manually specify a list of replicas to throttle because the broker would be able to figure out automatically in runtime which partitions needed to be throttled based on the LeaderAndIsr request.

...

Config nameTypeDefaultValid valuesImportanceDynamic update mode
leader.reassignment.throttled.rate.maxLong-1[-1,...]mediumper-broker
follower.reassignment.throttled.rate.maxLong-1[-1,...]mediumper-broker

...

The only change which needs to be mentioned is the tooling change. With this we'll change the --throttle option's behavior. If for some reason the old behavior is needed it can be reproduced by calling kafka-configs.sh manually before and after the reassignment with the intended parameters.

Rejected Alternatives

Reassignment throttling can be considered as a subset of replication throttle. This means that the full throttle value is given by replication.throttle.rate while reassignment.throttle.rate tells how much of that can be used for reassignment. Practically if replication.throttle.rate is set to 20 and reassignment.throttle.rate to 5, then 5 can be used for reassignment and 15 for other replication. While it has the advantage of being conceptually good in terms of handling reassignment as a special case, it is actually operationally harder to handle. If quotas are not used normally but only during reassignment as a safety net, then thinking about replication.throttle.rate and reassignment.throttle.rate together is more complicated. If replication.throttled.rate is configured during normal operation, then increasing or setting reassignment.throttled.rate would involve changing replication.throttled.rate to keep the bandwidth used for non-reassignment replication the same. This problem doesn't exist when these quotas are additive (meaning that 20 for replication and 5 reassignment adds up to a total of 25).