Status
Current state: "Under Discussion"
Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]
JIRA: here [Change the link from KAFKA-1 to your own ticket]
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Today, we can use SCRAM authentication for Kafka Brokers when the cluster uses ZooKeeper (ZK) for the quorum servers. This is possible to bootstrap by first setting up the ZK servers and setting the inter-broker communication password by directly updating them on the ZK before the Kafka Brokers are started. See Configuring SCRAM for details. We wish to implement some similar mechanism for storing the Kafka Broker authentication credentials for SCRAM when the cluster uses KRaft for the quorum servers. We want these credentials to be stored before the Kafka cluster starts for the first time.
Public Interfaces
Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.
A public interface is any change to the following:
Binary log format
The network protocol and api behavior
Any class in the public packages under clientsConfiguration, especially client configuration
org/apache/kafka/common/serialization
org/apache/kafka/common
org/apache/kafka/common/errors
org/apache/kafka/clients/producer
org/apache/kafka/clients/consumer (eventually, once stable)
Monitoring
Command line tools and arguments
- Anything else that will likely break existing users in some way when they upgrade
Proposed Changes
We will update the kafka-storage
tool to take SCRAM credentials and store them when we format each node.
The kafka-storage
tool is used to initialize the storage space for each broker and controller. One of the files created is the bootstrap.checkpoint
which contains a set of ApiMessageAndVersion
records that are needed for the bootstrap process of the cluster. I am proposing we add an interface to the kafka-storage tool so we can add the needed SCRAM credential updates, UserScramCredentialsRecord
which is an ApiMessageAndVersion
record, when the storage is formatted. With this, when the initial start of the cluster happens, each broker will have a copy of the SCRAM server side credential needed so they can communicate with each other.
I propose we add an option --add-metadata
to the kafka-storage
tool that will add an ApiMessageAndVersion
records directly into the __cluster_metadata
topic. The option can be applied multiple times to the format command so that multiple SCRAM records can be added to bootstrap the cluster. Below is the new updates kafka-storage
command.
./bin/kafka-storage.sh format -h usage: kafka-storage format [-h] --config CONFIG --cluster-id CLUSTER_ID [--add-metadata METADATA] [--release-version RELEASE_VERSION] [--ignore-formatted] optional arguments: -h, --help show this help message and exit --config CONFIG, -c CONFIG The Kafka configuration file to use. --cluster-id CLUSTER_ID, -t CLUSTER_ID The cluster ID to use. --add-metadata METADATA, -A METADATA Some METADATA Message to add to the __cluster_metadata log for this node e.g. 'UserScramCredentialsRecord={"Name":"alice","Mechanism":1,"Salt":"MWx2NHBkbnc0ZndxN25vdGN4bTB5eTFrN3E=","SaltedPassword":"mT0yyUUxnlJaC99HXgRTSYlbuqa4FSGtJCJfTMvjYCE=","Iterations":8192}' --release-version RELEASE_VERSION, -r RELEASE_VERSION A KRaft release version to use for the initial metadata version. --ignore-formatted, -g
I propose the METADATA argument will contain the type of the ApiMessageAndVersion
record to be added followed by a JSON set of name value pairs to populate the record. Initially I will only add support for UserScramCredentialsRecord
which will require arguments of a Name (which is implicitly type user), a mechanism which is 1 for SCRAM-SHA-256
or 2 for SCRAM-SHA-512
, the Salt, a Password or SaltedPassword, and the Iteration count. The salt and iteration count are not optional unlike in the kafka-config
command used to talk with ZK. We want the record data to match exactly on each node and normally when a salt is not specified a random salt is chosen. Since each node is initialized individually this would result in records with different salts which we don’t want.
Compatibility, Deprecation, and Migration Plan
These changes are additions only so there isn't a backwards compatibility issue.
Test Plan
These changes are to the bootstrap of a cluster and have no impact of existing clusters at all.
Rejected Alternatives
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.