Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add link to security documentation and mention that this page is out of date

Note: Please check the security documentation for the features that Kafka supports today. This page is no longer maintained, but it is kept for historical reasons.

 

Table of Contents

Overview

...

  1. support authentication of client (i.e. consumer & producer) connections to brokers
  2. support authorization of the assorted operations that can take place over those connections
  3. support encrypting those connections
  4. support security principals representing interactive users, user groups, and long-running services
  5. security should be optional; installations that don't want the above features shouldn't have to pay for them
  6. preserve backward compatibility; in particular, extant third-party clients should still work

Current implementation efforts are tracked in KAFKA-1682.

Features In Scope

  • Authentication via SSL & Kerberos through SASL
  • Auditing
  • Authorization through Unix-like users and , permissions and some kind of group or ACL notionNeed some kind of group or ACL notionACLs
  • Encryption over the wire No backward incompatible changes(optional)
  • It should be easy to enforce the use of security at a given site

Kerberos is the most commonly requested authentication mechanism for human usage. SSL is more common for applications that access Kafka. We would like a given broker to be capable of supporting both of these, as well as unauthenticated connections, at the same time (configurable, of course).

Security has to be controllable on a per-topic basis with some kind of granular authorization. In the absence of this you will end up with one Kafka cluster per application which defeats the purpose of a central message brokering cluster. Hence you need permissions and a manageable way to assign these in a large organization.

We think all this can probably be done in a backwards compatible manner and without significant performance degradation for non-secure users.

We plan to only augment the new producer & consumer implementations, and not the 0.8 implementations. This will hopefully drive adoption of the new implementations, as well as mitigate risk.

Details on these items are given below.


Features Out Of Scope (For Now)
  • Encryption/security of data at rest (can be addressed for now by encrypting individual fields in the message & filesystem security features)
  • Encryption/security of configuration files (can be addressed by filesystem security featuers)
  • Per-column encryption/security
  • Non-repudiation
  • Zookeeper operations & any add-on metrics
  • Provisioning of security credentials

...

Details on these items are given below.

Authentication

We need to support several methods of authentication:

  • SSL for access from applications (must)
  • Kerberos for access on behalf of people (must)
  • Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks running in the Hadoop environment to access Kafka (nice-to-have)
  • LDAP username/password (nice-to-have)
  • (configurable) support for not authenticating connections from the loopback interface (for trouble-shooting purposes)
  • All connections that have not yet been authenticated will be assigned a fake user ("nobody" or "josephk" or something). <- Note, admins should be able to disable fake users - auditors hate those.
Open Question: How do we want to setup connections?

Kerberos is the most commonly requested authentication mechanism for human usage. SSL is more common for applications that access Kafka. We would like a given broker to be capable of supporting both of these, as well as unauthenticated connections, at the same time (configurable, of course). We envision adding additional SASL mechanisms in the future (Hadoop, e.g.)

The LinkedIn security team (Arvind M & Michael H) suggest allocating one port to

...

on each broker for incoming SSL connections, one for all authentication mechanisms over SASL, and optionally a third for open, or unauthenticated incoming connections.

 

A port dedicated to SSL connections obviates the need for any Kafka-specific protocol signalling that authentication is beginning or negotiating an authentication mechanism (since this is all implicit in the fact that the client is connecting on that port). Clients simply begin the session by sending the standard SSL CLIENT-HELLO message. This has two advantages:

  1. the SSL handshake provides message integrity
  2. clients can use standard SSL libraries to establish the connection

A dedicated SASL port will, however, require a new Kafka request/response pair, as the mechanism for negotiating the particular mechanism is application-specific. This opens up the possibility of downgrade attacks (wherein an attacker could intercept the first message to the server requesting

...

one authentication mechanism, and modify the message to request

...

another, weaker mechanism). We can protect against that by designing the protocol to request a single authentication mechanism on the part of the client (that way, an attempted downgrade attack will result in handshake failure downstream).

Through this protocol, we could even support unauthenticated connections on the SASL port, as well.

A quick sketch:

  1. Client connects on the SASL port
  2. Server accepts, registers for reads on the new connection
  3. Client sends a (newly-defined) Authentication Request message containing an int indicating the desired mechanism, along with an optional initial SASL response packet
  4. Server can reject the request if it's not configured to use the requested mechanism, but if it does, it responds with with the SASL challenge data
  5. Client replies with SASL response data

N.B. This document originally stated "We will use SASL for Kerberos and LDAP.", but AFAICT there is no SASL mechanism covering LDAP (and the Java SASL library doesn't support it, at any rate).

Neither the non-blocking Java SSL library nor the Java SASL library actually transmit authentication data-- that's up to the client & server applications. I can see two ways of dealing with this:

  1. have brokers listen on one port per authentication protocol
  2. add a new AuthRequest/AuthResponse API to our protocol: this could select the authentication mechanism (first packet) on ports which support more than one, and will contain only a simple byte array containing the authentication bytes SASL/SSL/whatever needs.

Option one avoids the need to introduce new requests & responses as well as making it easy to configure firewalls to permit some protocols & not others. However, it carries the costs of bypassing the normal API layer (so no logging, no jmx monitoring, &c) and inconveniencing clients by introducing a non-conformant request/response mechanism.

Generally you would expect authentication to happen at connection time, but I don't think there is really any reason we need to require this. I think instead we can allow it at any time during a session or even multiple times during a session (if the client wishes to change their user a la su). However this is fundamentally attached to the connection, so if the client reconnects they will lose their authentication and need to re-authenticate.

Regardless of the mechanism by which you connect and authenticate, the mechanism by which we check your permissions should be the same. The side effect of a successful connection via SSL with a client cert or a successful auth request will be that we store the user information along with that connection. The user will be based along with the request object to KafkaApis on each subsequent request.

Admins should be Administrators should be able to disable any authentication protocol in configuration.  Presumably this would need to be maintained in the cluster metadata so clients can choose to connect to the appropriate port. 

...

All future checks for authorization will just check this session information.

Authorization

N.B. This is still under discussion; I've tried to pull together the current consensus here.

Regardless of the mechanism by which you connect and authenticate, the mechanism by which we check your permissions should be the same. The side effect of a successful connection via SSL with a client certificate or a successful authentication request by some other means will be that we store the user information along with that connection. The user will be based along with the request object to KafkaApis on each subsequent request.


Security has to be controllable on a per-topic basis with some kind of granular authorization. In the absence of this you will end up with one Kafka cluster per application which defeats the purpose of a central message brokering cluster. Hence you need permissions and a manageable way to assign these in a large organization.

The plan will be to support unix-like permissions on a per-topic level.

...

PermissionManager.isPermitted(Subject subject, InetAddress ip, Permissions permission, String resource)

...

PermissionManager.isPermitted(session.subject(), session.peerIpAddress(), Permissions.WRITE, topicName)

...

The subject is basically the "user name" or identify of the person trying to take some action. This will be established via whatever authentication mechanism. The action is basically a list of things you may be permitted to do (e.g. read, write, etc).

The IP address of the originating connection is is passed as it may be useful in certain authorization situations (whitelisting, or being more generous when the request originates on the loopback address, e.g.)

The PermissionManager will both check whether access was permitted and also log the attempt for audit purposes.

...

Permission are not hierarchical since topics are not hierarchical. So a user will have a default value for these (a kind of umask) as well as a potential override on a per-topic basis. Note that CREATE and DESCRIBE permission primarily makes sense at the default level.

Implementing the PermissionManager

The above just gives a high-level api API for checking if a particular user is allowed to do a particular thing. How permissions are stored, and how users are grouped together is going to need to be pluggable.

...

We will try to provide something simple out of the box.

Administrators may disable authentication in configuration (giving an "audit-only" mode).

Deriving a Principal Name from Authentication Credentials

If we are to make the authorization library independent of the authentication mechanism, then we need to map each mechanism's credentials to the principal abstraction to be used in the authorization API. LinkedIn security proposes the following:

The principal is just a user name (i.e. a String).

When the client authenticates using SSL, the user name will be the first element in the Subject Alternate Name field of the client certificate.

When the client authenticates using Kerberos, the user name will be the fully-qualified Kerberos principal name. Admins can modify this through configuration using the standard Kerberos auth_to_local mechanism (cf. here).

When the client does not authenticate, the user name will be "nobody".

Auditing

All authentication operations will be logged to file by the Kafka code (i.e. this will not be pluggable).  The implementation should use a dedicated logger so as to 1) segregate security logging & 2) support keeping the audit log in a separate (presumably secured) location.

Encryption

For performance reasons, we propose making encryption optional. When using Kerberos (via SASL & GSS-API), there are explicit parameters through which clients can signal their interest in encryption (similarly for SSL).

 


Sequencing

Here is a proposed sequence of work

...