...
Change of the ELR does not require a leader epoch bump. In most cases, the ELR updates along with the ISR changes. The only case of the ELR changes alone is when an ELR broker registers after an unclean shutdown. In this case, no need to bump the leader epoch.
When updating the config min.insync.replicas, if the new min ISR <= current ISR, the ELR will be removed.
A new metric of Electable leaders will be added. It reflects the count of (ISR + ELR).
The AlterPartitionReassignments. The leader updates the ISR implicitly with AlterPartition requests. The controller will make sure than upon completion, the ELR only contains replicas in the final replica set. Additionally, in order to improve the durability of the reassignment
- The current behavior, when completing the reassignment, all the adding replicas should be in ISR. This behavior can result in 1 replica in ISR. Also, ELR may not help here because the removing ISR replicas can not stay in ELR when completed. So we propose to enforce that the reassignment can only be completed if the ISR size is larger or equal to min ISR.
- This min ISR requirement is also enforced when the reassignment is canceled.
Have a new admin API DescribeTopicRequest DescribeTopicsRequest for showing the topic details. We don't want to embed the ELR info in the Metadata API. The ELR is not some necessary details to be exposed to user clients.
- More public facing details will be discussed in the DescribeTopicRequest DescribeTopicsRequest section.
We also record the last-known ELR members.
It basically means when an ELR member has an unclean shutdown, it will be removed from ELR and added to the LastKnownELR. The LastKnownELR will be cleaned when ISR reaches the min ISR.
LastKnownELR is stored in the metadata log.
LastKnownELR will be also useful in the Unclean Recovery section.
- The last known leader will be tracked.
- This can be used if the Unclean recovery is not enabled. More details will be discussed in the Deliver Plan.
- The controller will record the last ISR member(the leader) when it is fenced.
- It will be cleaned when a new leader is elected.
...
|
...
DescribeTopicPartitionsRequest (Coming with ELR)
Should be issued by admin clients. More admin client related details please refer to the Admin API/Client changes
ACL: Describe Topic
Limit: 1000 partitions max per response.
The caller can query the partitions starting from the first partition id.list the topics interested or keep the field empty if requests all of the topics.
Pagination.
This is a new behavior introduced. The caller can specify the maximum number of If the server finds more than 1000 partitions to be included , only the first 1000(alphabet order) will be returned with infoin the response.
If
...
there are more partitions than the limit, these partitions and their topics will not be sent back. In this case, the Cursor field will be populated. The caller can include this cursor in the next request.
Note,
- There is also a server-side config to control the maximum number of partitions to return. max.request.partition.size.limit
- There is no consistency guarantee between requests.
- It is an admin client facing API, so there is no topic id supported.
|
Note, the request can have a mix of partition-specific topics and range-query topics.
|
DescribeTopicsResponse
|
DescribeTopicResponse
|
CleanShutdownFile (Coming with ELR)
...
|
...
|
Config changes
The new configs are introduced for ELR
- eligible.leader.replicas.enabled. It controls whether the controller will record the ELR-related metadata and whether ISR can be empty. False is the default value. It will turn true in the future.
- max.request.partition.size.limit. The maximum number of partitions to return in a API response.
The new configs are introduced for Unclean Recovery.
...
The admin client will start to use the DescribeTopicRequest DescribeTopicsRequest to describe the topic.
- The client will split a large request into proper pieces and send them one after another if the requested topics count reaches the limit.
- The client will retry querying the topics if they received the new retriable error REQUEST_LIMIT_REACHEDresponse with Cursor field.
- The output of the topic describe will be updated with the ELR related fields.
- TopicPartitionInfo will be updated to include the ELR related fields.
...
min.insync.replicas will no longer be effective to be larger than the replication factor. For existing configs, the min.insync.replicas will be min(min.insync.replicas, replication factor).
Cluster admin should update the min.insync.replicas to 1if they want to have the replication going when there is only the leader in the ISR.
- Note that, this new requirement is not guarded by any feature flags/Metadata version.
ELR
It will be guarded by a new metadata version and the eligible.leader.replicas.enabled. So it is not enabled during the rolling upgrade.
After the controller picked up the new MV and eligible.leader.replicas.enabled is true, when it loads the partition states, it will populate the ELR as empty if the PartitionChangeRecord uses an old version. In the next partition update, the controller will record the current ELR. Note, the leader can only enable empty ISR after the new metadata version.
MV downgrade: Once the MV version is downgraded, all the ELR related fields will be removed on the next partition change. The controller will also ignore the ELR fields.
...