The following is a proposal for securing Apache Kafka. It is based on the following goals:

support authentication of client (i.e. consumer & producer) connections to brokers
support authorization of the assorted operations that can take place over those connections
support encrypting those connections
support security principals representing interactive users, user groups, and long-running services
security should be optional; installations that don't want the above features shouldn't have to pay for them
preserve backward compatibility; in particular, extant third-party clients should still work

Features In Scope

Authentication
Unix-like users and permissions and some kind of group or ACL notion
Need some kind of group or ACL notion
Encryption over the wire
No backward incompatible changes
It should be easy to enforce the use of security at a given site

Kerberos is the most commonly requested authentication mechanism for human usage, . SSL is more common for applications that access Kafka. We would like a given broker to be capable of supporting both of these, as well as unauthenticated connections, at the same time (configurable, of course).

Security We think you can have security be an all or nothing proposition, it has to be controllable on a per-topic basis with some kind of granular authorization. In the absence of this you will end up with one Kafka cluster per application which defeats the purpose of a central message brokering cluster. Hence you need permissions and a manageable way to assign these in a large organization.

We think all this can probably be done in a backwards compatible manner and without significant performance degradation for non-secure users.

We plan to only augment the new producer & consumer implementations, and not the 0.8 implementations. This will hopefully drive adoption of the new implementations, as well as mitigate risk.

Features Out Of Scope (For Now)

Encryption of data at rest
Encryption/security of configuration files
Per-column encryption/security
Non-repudiation
Zookeeper operations & any add-on metrics
Provisioning of security credentials

The above items are important things that people want but for now can be implemented at a level above Kafka by encrypting individual fields in the message.

Details on these items is are given below.

Authentication

...

SSL for access from applications (must)
Kerberos for access on behalf of people (must)
Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks running in the Hadoop environment to access Kafka
LDAP username/password (nice-to-have)

We will use SASL for kerberos and LDAP.

...

(configurable) support for not authenticating connections from the loopback interface (for trouble-shooting purposes)

All connections that have not yet been authenticated will be assigned a fake user ("nobody" or "josephk" or something). <- Note, admins should be able to disable fake users - auditors hate those.

Open Question: How do we want to setup connections

...

?

LinkedIn security team (Arvind M ) believes it does (at least for Kerberos) and there is no need to change the protocol. If it doesn't we will need to & Michael H) suggest allocating one port to insecure connections only. This would make it simple to enforce security by configuration, and would also prevent downgrade attacks (wherein an attacker could intercept the first message to the server requesting authentication on a port supporting both plaintext & something else, and modify the message to request plaintext).

This document originally stated "We will use SASL for Kerberos and LDAP.", but AFAICT there is no SASL mechanism covering LDAP (and the Java SASL library doesn't support it, at any rate).

Neither the non-blocking Java SSL library nor the Java SASL library actually transmit authentication data-- that's up to the client & server applications. I can see two ways of dealing with this:

have brokers listen on one port per authentication protocol
add a new AuthRequest/AuthResponse API to our protocol

...

: this could select the authentication mechanism (first packet) on ports which support more than one, and will contain only a simple byte array containing the

...

authentication bytes SASL/SSL/whatever needs.

Option one avoids the need to introduce new requests & responses as well as making it easy to configure firewalls to permit some protocols & not others. However, it carries the costs of bypassing the normal API layer (so no logging, no jmx monitoring, &c) and inconveniencing clients by introducing a non-conformant request/response mechanism.

Generally you would expect authentication to happen at connection time, but I don't think there is really any reason we need to require this. I think instead we can allow it at any time during a session or even multiple times during a session (if the client wishes to change their user a la su). However this is fundamentally attached to the connection, so if the client reconnects they will lose their authentication and need to re-authenticate.

All connections that have not yet been authenticated will be assigned a fake user ("nobody" or "josephk" or something). <- Note, admins should be able to disable fake users - auditors hate those.

Regardless of the mechanism by which you connect and authenticate, the mechanism by which we check your permissions should be the same. The side effect of a successful connection via SSL with a client cert or a successful auth request will be that we store the user information along with that connection. The user will be based along with the request object to KafkaApis on each subsequent request.Implementation notes

We want 3 separate ports: SSL, SASL and plaintext. Admins should be able to disable any of those in Kafka level.authentication protocol in configuration. Presumably this would need to be maintained in the cluster metadata so clients can choose to connect to the appropriate port.

This feature requires some co-operation between the socket server and the api API layer. The API layer will handle the authenticate request, but the username will be associated with the connection. One approach to implementing this would be to add the concept of a Session object that is maintained with the connection and contains the username. The session would be stored in the context for the socket in socket server and destroyed as part of socket close. The session would be passed down to the API layer with each request and we would have something like session.authenticatedAs() to get the username to use for authorization purposes. We will also record in the session information about the security level of the connection (does it use encryption? integrity checks?) for use in authorization.

All future checks for authorization will just check this session information.Open Question: Do we want to support all 0.8 producer/consumer API? Or just new producer/consumer?

Authorization

The plan will be to support unix-like permissions on a per-topic level.

...

We will try to provide something simple out of the box.

Encryption

For performance reasons, we propose making encryption optional. When using Kerberos (via SASL & GSS-API), there are explicit parameters through which clients can signal their interest in encryption (similarly for SSL).

Sequencing

Here is a proposed sequence of work

...

Space shortcuts

Child pages

Versions Compared

Old Version 11

New Version 12

Key

Features In Scope

Features Out Of Scope (For Now)

Authentication

Authorization

Encryption

Sequencing

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 11

New Version 12

Key

Features In Scope

Features Out Of Scope (For Now)

Authentication

Authorization

Encryption

Sequencing