Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Avoid bouncing broker in order to lose its leadership: it would be good if we have a way to specify which broker should be excluded from serving traffic/leadership (without changing the replica assignment ordering by reassignments, even though that's quick), and run preferred leader election. A bouncing broker will cause temporary URP, and sometimes other issues. Also a bouncing of broker (e.g. broker_id 1) can temporarily lose all its leadership, but if another broker (e.g. broker_id 2) fails or gets bounced, some of its leaderships will likely failover to broker_id 1 on a replica with 3 brokers. If broker_id 1 is in the blacklist, then in such a scenario even broker_id 2 offline, the 3rd broker can take leadership.

The current work-around of the above is to change the topic/partition's replica reassignments to move the broker_id 1 from the first position to the last position and run preferred leader election. e.g. (1, 2, 3) => (2, 3, 1). This changes the replica reassignments, and we need to keep track of the original one and restore if things change (e.g. controller fails over to another broker, the swapped empty broker caught up). That’s a rather tedious task.

The following is the requirements this KIP is trying to accomplish: 

  • The ability to add and remove the preferred leader deprioritized list/blacklist. e.g. new ZK path/node or new dynamic config. 
  • The logic to determine the priority/order of which broker should be preferred leader should be modified.  The broker in the preferred leader blacklist should be moved to the end (lowest priority) when determining leadership. 
  • The blacklist can be at the broker level. However, there might be use cases where a specific topic should blacklist particular brokers, which would be at the Topic level Config. For this use cases of this KIP, it seems that broker level blacklist would suffice.  Topic level preferred leader blacklist might be future enhancement work. 


Public Interfaces

Introduce a preferred_leader_blacklist dynamic config which by default is empty.  It allows a list of broker IDs separated by commas.  E.g. below broker ID 1,10,65 are being put into the blacklist. 

...

The dynamic config should not trigger any leadership changes automatically for the current design.

Proposed Changes


The following is the requirements this KIP is trying to accomplish: 

  • The ability to add and remove the preferred leader deprioritized list/blacklist. e.g. new ZK path/node or new dynamic config. 
  • The logic to determine the priority/order of which broker should be preferred leader should be modified.  The broker in the preferred leader blacklist should be moved to the end (lowest priority) when determining leadership. 
  • The blacklist can be at the broker level. However, there might be use cases where a specific topic should blacklist particular brokers, which would be at the Topic level Config. For this use cases of this KIP, it seems that broker level blacklist would suffice.  Topic level preferred leader blacklist might be future enhancement work. 


The preferred leader blacklist should only be used for leadership determination when either of the two gets triggered below: 

...

  • This feature will give the Kafka system administrator/on-call engineers the ability to quickly address some issues with bad hardware which should not be serve leadership traffic.  Most of the use-cases for the blacklist is temporary.  Though some of the case like heterogeneous hardware might be persistent for a longer time. 
  • The behavior of preferred leader election and failed brokers new leader election logic will be changed.  In any cases,  the broker in preferred leader blacklist are not excluded from leadership election per se.  It's just moved to the lowest priority when determining leadership for a topic/partition.

Rejected Alternatives

  • The current work-around of the above is to change the topic/partition's replica reassignments to move the broker_id 1 from the first position to the last position and run preferred leader election. e.g. (1, 2, 3) => (2, 3, 1). This changes the replica reassignments, and we need to keep track of the original one and restore if things change (e.g. controller fails over to another broker, the swapped empty broker caught up). That’s a rather tedious task.
  • Rejected the design to put the preferred leader blacklist in the per broker zookeeper node. e.g. /preferred_leader_blacklist/<broker_id>.   This will introduce new RPC request/response to manipulate these ZK nodes.  Also in the future, there might be a need to make this blacklist at the topic Config level.