Status

Current state: "Under Discussion"

Discussion thread: here

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Now that Apache Kafka 4.0 is on the horizon, it is a good time to do some cleanups. In particular, I would like to spell out the configuration keys that will be going away in Apache Kafka 4.0. I have also added some minor changes to AuthorizerServerInfo, and two new metrics for managing ZK migration.

These changes are targetted at AK 3.7.

Public Interfaces

Configuration Key Removals

Configuration KeyDeprecatedRemovedReason
message.format.versionKafka 3.0Kafka 4.0

KRaft clusters have always used RecordVersion.V2. This will continue to be true in Kafka 4.0. So there is no need for this configuration any more.

If we decide to migrate to a new on-disk format some day, we'll probably use a new mechanism to do so, not a static configuration key. However, no such migration is planned currently.

inter.broker.protocol.versionKafka 3.7Kafka 4.0

In KRaft mode, inter.broker.protocol.version is ignored except for when formatting directories. In Kafka 4.0, it will be removed entirely to avoid confusion.

The desired metadata version when running the format tool can be selected via the command line.

leader.imbalance.per.broker.percentageKafka 3.7Kafka 4.0

KRaft mode has never implemented leader.imbalance.per.broker.percentage. Instead, if leader balancing is turned on, we try to use the preferred replica for all partitions. Therefore, this configuration is not needed.

controlled.shutdown.max.retriesKafka 3.7Kafka 4.0

This is not used in KRaft since the controlled shutdown mechanism relies on heartbeat responses, not RPCs sent from the active controller.

controlled.shutdown.retry.backoff.msKafka 3.7Kafka 4.0

This is not used in KRaft since the controlled shutdown mechanism relies on heartbeat responses, not RPCs sent from the active controller.

password.encoder.secretKafka 3.7Kafka 4.0

This relates to how secrets are stored in ZK, which is not relevant with KRaft

password.encoder.old.secretKafka 3.7Kafka 4.0

This relates to how secrets are stored in ZK, which is not relevant with KRaft

password.encoder.keyfactory.algorithmKafka 3.7Kafka 4.0

This relates to how secrets are stored in ZK, which is not relevant with KRaft

password.encoder.cipher.algorithmKafka 3.7Kafka 4.0

This relates to how secrets are stored in ZK, which is not relevant with KRaft

password.encoder.key.lengthKafka 3.7Kafka 4.0

This relates to how secrets are stored in ZK, which is not relevant with KRaft

password.encoder.iterationsKafka 3.7Kafka 4.0

This relates to how secrets are stored in ZK, which is not relevant with KRaft

zookeeper.connectKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.session.timeout.msKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.connection.timeout.msKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.set.aclKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.max.in.flight.requestsKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.client.enableKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.clientCnxnSocketKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.keystore.locationKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.keystore.passwordKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.keystore.typeKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.truststore.locationKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.truststore.passwordKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.truststore.typeKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.protocolKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.enabled.protocolsKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.cipher.suitesKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.endpoint.identification.algorithmKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.crl.enableKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

zookeeper.ssl.ocsp.enableKafka 3.7Kafka 4.0

This is a ZK connection configuration which is not relevant with KRaft

broker.id.generation.enableKafka 3.7Kafka 4.0

Related to automatic broker ID generation, which KRaft does not support. (See KIP-631)

reserved.broker.max.idKafka 3.7Kafka 4.0

Related to automatic broker ID generation, which KRaft does not support. (See KIP-631)

control.plane.listener.nameKafka 3.7Kafka 4.0We no longer need to maintain a separate listener for messages from the controller, since the controller does not send messages out any more (it receives them). (See KIP-631)
zookeeper.metadata.migration.enableKafka 4.0Kafka 4.0

This configuration is used to migrate from ZK mode to KRaft. Since ZK mode is no longer supported in AK 4.0, this configuration will no longer be needed.

Note that this configuration breaks the usual pattern of being deprecated prior to being removed. This is necesary because we certainly don't intend to deprecate migration in 3.7, but neither do we want to support it once ZK mode is gone. So this is a special case.

Deprecation and Removal of Support for Dynamic Listener Addition

Previously, it was possible to dynamically add a new listener to a ZK-based broker which did not appear anywhere in the static configuration file. This feature greatly complicated the code, since it moved the listener map from being immutable to being always modifiable. It was also very poorly integrated with things like the Authorizer. Nearly all Authorizer implementations assumed that they knew the set of listeners when the broker or controller was starting up.

This feature was not really needed. The administrator could always add listeners to the cluster in a rolling fashion, by taking down each node, adding a new listener to its config map, and bringing it back up.  Since the need to add entirely new listeners to the cluster comes up so rarely, this simple static process was adequate. Indeed, it was greatly preferred, since the dynamic process was extremely poorly tested.

For these reasons, KRaft mode never added support for dynamic listener addition or removal. Therefore, we should deprecate this in Kafka 3.7 and remove it in Kafka 4.0.

It's important to emphasize that we are not deprecating or removing the various dynamic listener configurations, like max.connections, num.network.threads, ssl.client.auth, etc. Those will continue to be usable. The only thing that is being deprecated and removed here is the ability to add entirely new listeners that were not described in the broker or controller configuration file.

Changes to AuthorizerServerInfo

AuthorizerServerInfo is a class that is used to pass some information to Authorizer objects when starting them. Unfortunately, it has some ZK-specific assumptions. For example, it assumes that everything is a broker. Let's clean it up for Kafka 4.0 with the following changes to its fields:

MethodChange in 3.7Change in 4.0Reasoning
clusterResourcenonenonen/a
brokerIddeprecateremoveSince authorizers are used on controllers, "node ID" is more appropriate than "broker ID"
nodeIdadd with default implementation that delegates to brokerId()keepSince authorizers are used on controllers, "node ID" is more appropriate than "broker ID"
endpointsadd JavaDoc clarifying that only listeners opened by the node will be includednoneThe JavaDoc should be clear about this
interBrokerEndpointdeprecateremove

This field doesn't make sense on controllers, because they don't expose an inter-broker endpoint.

Authorizers that want to find the configured inter-broker endpoint on broker nodes can examine the configuration map to find that information if it is needed.

earlyStartListenersnonenonen/a

New Metrics

CurrentControllerId

NameContextTypeModeDescription
kafka.server:type=MetadataLoader,name=CurrentControllerIdBroker and ControllerIntegerKRaft and ZKOutputs the ID of the current controller, or -1 if none is known.

The CurrentControllerId metric shows the ID of the controller, as seen by the node in question. If the current node doesn't think there is an active controller, the value of thisd metric will be -1.

Why create this metric, when ActiveControllerCount already exists? The answer is that in KRaft mode, ActiveControllerCount is only exposed on controller nodes, not on broker nodes. That makes it impossible to monitor what the brokers think the current active controller is.

ZkMigrationPhase

NameContextTypeModeDescription
kafka.server:type=ZkMigration,name=ZkMigrationPhaseBroker and ControllerIntegerKRaft and ZKOutputs the phase of the ZK migration.

ZkMigrationPhase indicates the "phase" of the ZK migration.

Why create this metric, when ZkMigrationState already exists?

  • One reason is that the 5 integer values of the ZkMigrationState metric are presented in a jumbled order because of compatibility concerns. There isn't a clear forward progression with ZkMigrationState.
  • More importantly, we want a metric that can clearly show progression on a node-by-node basis. For example, if a zk broker is taken down and replaced with a kraft broker, it would be good to have a metric that showed that at a glance. ZkMigrationPhase will do that, since it follows more than just the migration state in the metadata image.

In summary, "zk migration state" is still useful to see, but it is a low-level detail. "zk migration phase" will provide an overview of the migration process. The phases are as described below:

ValuePhaseDescription
-1zk phaseThe cluster is in ZK mode, and no migration is in progress.
0pre-hybrid phaseMigration is about to start
1hybrid phaseSome brokers are ZK, but the controller is KRaft
2dual write phaseBother brokers and controllers are KRaft, but we are still writing to ZK
3kraft phaseThe cluster is in KRaft mode, and the migration is done (or never was needed in the first place)

This is how the brokers determine what phase metric to expose:

ValuePhasezookeeper.metadata.migration.enableController modeBroker Mode
-1zk phasefalseZooKeeperZooKeeper 
0pre-hybrid phasetrueZooKeeperZooKeeper
1hybrid phasetrueKRaftZookeeper
2dual write phasetrueKRaftKRaft
3kraft phasefalseKRaftKRaft

This is how the kcontrollers determine what phase metric to expose:

ValuePhasezookeeper.metadata.migration.enableZkMigrationStateZK brokers registered
0pre-hybrid phasetruePRE_MIGRATION (2)yes
1hybrid phasetrueMIGRATION (1)yes
2dual write phasetrueMIGRATION (1)no
3kraft phasefalsePOST_MIGRATION (3) or NONE (0)no

Compatibility, Deprecation, and Migration Plan

The changes describe above are in keeping with the plan to remove ZK mode in Apache Kafka 4.0, as described by KIP-833.

The new metrics are net-new additions that will not change the semantics of any existing metric. The new metrics will also continue to be exposed in Apache Kafka 4.0.

Test Plan

The new metrics will need unit and integration tests as per usual.

Rejected Alternatives

The CurrentControllerId metric could have been put in a different namespace than MetadataLoader. After all, in ZK mode we don't really use MetadataLoader. However, it is extremely convenient for the metric to have the same name in both ZK mode and KRaft mode, so putting it in MetadataLoader just made sense.

  • No labels