Status
Current state: "Under Discussion"
Discussion thread: here
JIRA: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Now that Apache Kafka 4.0 is on the horizon, it is a good time to do some cleanups. In particular, I would like to spell out the configuration keys that will be going away in Apache Kafka 4.0. I have also added some minor changes to AuthorizerServerInfo, and two new metrics for managing ZK migration.
These changes are targetted at AK 3.7.
Public Interfaces
Configuration Key Removals
Configuration Key | Deprecated | Removed | Reason |
---|---|---|---|
message.format.version | Kafka 3.0 | Kafka 4.0 | KRaft clusters have always used RecordVersion.V2. This will continue to be true in Kafka 4.0. So there is no need for this configuration any more. If we decide to migrate to a new on-disk format some day, we'll probably use a new mechanism to do so, not a static configuration key. However, no such migration is planned currently. |
inter.broker.protocol.version | Kafka 3.7 | Kafka 4.0 | In KRaft mode, inter.broker.protocol.version is ignored except for when formatting directories. In Kafka 4.0, it will be removed entirely to avoid confusion. The desired metadata version when running the format tool can be selected via the command line. |
leader.imbalance.per.broker.percentage | Kafka 3.7 | Kafka 4.0 | KRaft mode has never implemented leader.imbalance.per.broker.percentage. Instead, if leader balancing is turned on, we try to use the preferred replica for all partitions. Therefore, this configuration is not needed. |
controlled.shutdown.max.retries | Kafka 3.7 | Kafka 4.0 | This is not used in KRaft since the controlled shutdown mechanism relies on heartbeat responses, not RPCs sent from the active controller. |
controlled.shutdown.retry.backoff.ms | Kafka 3.7 | Kafka 4.0 | This is not used in KRaft since the controlled shutdown mechanism relies on heartbeat responses, not RPCs sent from the active controller. |
password.encoder.secret | Kafka 3.7 | Kafka 4.0 | This relates to how secrets are stored in ZK, which is not relevant with KRaft |
password.encoder.old.secret | Kafka 3.7 | Kafka 4.0 | This relates to how secrets are stored in ZK, which is not relevant with KRaft |
password.encoder.keyfactory.algorithm | Kafka 3.7 | Kafka 4.0 | This relates to how secrets are stored in ZK, which is not relevant with KRaft |
password.encoder.cipher.algorithm | Kafka 3.7 | Kafka 4.0 | This relates to how secrets are stored in ZK, which is not relevant with KRaft |
password.encoder.key.length | Kafka 3.7 | Kafka 4.0 | This relates to how secrets are stored in ZK, which is not relevant with KRaft |
password.encoder.iterations | Kafka 3.7 | Kafka 4.0 | This relates to how secrets are stored in ZK, which is not relevant with KRaft |
zookeeper.connect | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.session.timeout.ms | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.connection.timeout.ms | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.set.acl | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.max.in.flight.requests | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.client.enable | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.clientCnxnSocket | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.keystore.location | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.keystore.password | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.keystore.type | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.truststore.location | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.truststore.password | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.truststore.type | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.protocol | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.enabled.protocols | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.cipher.suites | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.endpoint.identification.algorithm | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.crl.enable | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
zookeeper.ssl.ocsp.enable | Kafka 3.7 | Kafka 4.0 | This is a ZK connection configuration which is not relevant with KRaft |
broker.id.generation.enable | Kafka 3.7 | Kafka 4.0 | Related to automatic broker ID generation, which KRaft does not support. (See KIP-631) |
reserved.broker.max.id | Kafka 3.7 | Kafka 4.0 | Related to automatic broker ID generation, which KRaft does not support. (See KIP-631) |
control.plane.listener.name | Kafka 3.7 | Kafka 4.0 | We no longer need to maintain a separate listener for messages from the controller, since the controller does not send messages out any more (it receives them). (See KIP-631) |
zookeeper.metadata.migration.enable | Kafka 4.0 | Kafka 4.0 | This configuration is used to migrate from ZK mode to KRaft. Since ZK mode is no longer supported in AK 4.0, this configuration will no longer be needed. Note that this configuration breaks the usual pattern of being deprecated prior to being removed. This is necesary because we certainly don't intend to deprecate migration in 3.7, but neither do we want to support it once ZK mode is gone. So this is a special case. |
Deprecation and Removal of Support for Dynamic Listener Addition
Previously, it was possible to dynamically add a new listener to a ZK-based broker which did not appear anywhere in the static configuration file. This feature greatly complicated the code, since it moved the listener map from being immutable to being always modifiable. It was also very poorly integrated with things like the Authorizer. Nearly all Authorizer implementations assumed that they knew the set of listeners when the broker or controller was starting up.
This feature was not really needed. The administrator could always add listeners to the cluster in a rolling fashion, by taking down each node, adding a new listener to its config map, and bringing it back up. Since the need to add entirely new listeners to the cluster comes up so rarely, this simple static process was adequate. Indeed, it was greatly preferred, since the dynamic process was extremely poorly tested.
For these reasons, KRaft mode never added support for dynamic listener addition or removal. Therefore, we should deprecate this in Kafka 3.7 and remove it in Kafka 4.0.
It's important to emphasize that we are not deprecating or removing the various dynamic listener configurations, like max.connections
, num.network.threads
, ssl.client.auth
, etc. Those will continue to be usable. The only thing that is being deprecated and removed here is the ability to add entirely new listeners that were not described in the broker or controller configuration file.
Changes to AuthorizerServerInfo
AuthorizerServerInfo is a class that is used to pass some information to Authorizer objects when starting them. Unfortunately, it has some ZK-specific assumptions. For example, it assumes that everything is a broker. Let's clean it up for Kafka 4.0 with the following changes to its fields:
Method | Change in 3.7 | Change in 4.0 | Reasoning |
---|---|---|---|
clusterResource | none | none | n/a |
brokerId | deprecate | remove | Since authorizers are used on controllers, "node ID" is more appropriate than "broker ID" |
nodeId | add with default implementation that delegates to brokerId() | keep | Since authorizers are used on controllers, "node ID" is more appropriate than "broker ID" |
endpoints | add JavaDoc clarifying that only listeners opened by the node will be included | none | The JavaDoc should be clear about this |
interBrokerEndpoint | deprecate | remove | This field doesn't make sense on controllers, because they don't expose an inter-broker endpoint. Authorizers that want to find the configured inter-broker endpoint on broker nodes can examine the configuration map to find that information if it is needed. |
earlyStartListeners | none | none | n/a |
New Metrics
CurrentControllerId
Name | Context | Type | Mode | Description |
---|---|---|---|---|
kafka.server:type=MetadataLoader,name=CurrentControllerId | Broker and Controller | Integer | KRaft and ZK | Outputs the ID of the current controller, or -1 if none is known. |
The CurrentControllerId
metric shows the ID of the controller, as seen by the node in question. If the current node doesn't think there is an active controller, the value of thisd metric will be -1.
Why create this metric, when ActiveControllerCount
already exists? The answer is that in KRaft mode, ActiveControllerCount
is only exposed on controller nodes, not on broker nodes. That makes it impossible to monitor what the brokers think the current active controller is.
ZkMigrationPhase
Name | Context | Type | Mode | Description |
---|---|---|---|---|
kafka.server:type=ZkMigration,name=ZkMigrationPhase | Broker and Controller | Integer | KRaft and ZK | Outputs the phase of the ZK migration. |
ZkMigrationPhase
indicates the "phase" of the ZK migration.
Why create this metric, when ZkMigrationState
already exists?
- One reason is that the 5 integer values of the ZkMigrationState metric are presented in a jumbled order because of compatibility concerns. There isn't a clear forward progression with ZkMigrationState.
- More importantly, we want a metric that can clearly show progression on a node-by-node basis. For example, if a zk broker is taken down and replaced with a kraft broker, it would be good to have a metric that showed that at a glance. ZkMigrationPhase will do that, since it follows more than just the migration state in the metadata image.
In summary, "zk migration state" is still useful to see, but it is a low-level detail. "zk migration phase" will provide an overview of the migration process. The phases are as described below:
Value | Phase | Description |
---|---|---|
-1 | zk phase | The cluster is in ZK mode, and no migration is in progress. |
0 | pre-hybrid phase | Migration is about to start |
1 | hybrid phase | Some brokers are ZK, but the controller is KRaft |
2 | dual write phase | Bother brokers and controllers are KRaft, but we are still writing to ZK |
3 | kraft phase | The cluster is in KRaft mode, and the migration is done (or never was needed in the first place) |
This is how the brokers determine what phase metric to expose:
Value | Phase | zookeeper.metadata.migration.enable | Controller mode | Broker Mode |
---|---|---|---|---|
-1 | zk phase | false | ZooKeeper | ZooKeeper |
0 | pre-hybrid phase | true | ZooKeeper | ZooKeeper |
1 | hybrid phase | true | KRaft | Zookeeper |
2 | dual write phase | true | KRaft | KRaft |
3 | kraft phase | false | KRaft | KRaft |
This is how the kcontrollers determine what phase metric to expose:
Value | Phase | zookeeper.metadata.migration.enable | ZkMigrationState | ZK brokers registered |
---|---|---|---|---|
0 | pre-hybrid phase | true | PRE_MIGRATION (2) | yes |
1 | hybrid phase | true | MIGRATION (1) | yes |
2 | dual write phase | true | MIGRATION (1) | no |
3 | kraft phase | false | POST_MIGRATION (3) or NONE (0) | no |
Compatibility, Deprecation, and Migration Plan
The changes describe above are in keeping with the plan to remove ZK mode in Apache Kafka 4.0, as described by KIP-833.
The new metrics are net-new additions that will not change the semantics of any existing metric. The new metrics will also continue to be exposed in Apache Kafka 4.0.
Test Plan
The new metrics will need unit and integration tests as per usual.
Rejected Alternatives
The CurrentControllerId metric could have been put in a different namespace than MetadataLoader. After all, in ZK mode we don't really use MetadataLoader. However, it is extremely convenient for the metric to have the same name in both ZK mode and KRaft mode, so putting it in MetadataLoader just made sense.