Security

The following is a proposal for securing Kafka. We don't know this area that well, so this document is as much notes and exploration of the space as a spec.

Requirements

Need to support both Kerberos and TLS (SSL)
Need to support unix-like users and permissions
Need some kind of group or ACL notion
No backwards-incompatible release

Kerberos is the most common for human usage, SSL is more common for applications.

We think you can have security be an all or nothing proposition, it has to be controllable on a per-topic basis with some kind of granular authorization. In the absence of this you will end up with one Kafka cluster per application which defeats the purpose of a central message brokering cluster. Hence you need permissions and a manageable way to assign these in a large organization (groups or acls).

We think all this can probably be done in a backwards compatible manner and without significant performance degradation for non-secure users.

Assumptions

On-disk encryption is not needed
We can secure Zookeeper
Not trying to run Kafka on the open internet

We assume on-disk encryption is not needed as even many databases do not provide this.

We will need to secure zookeeper if we use it for storing any permissions information.

Making Kafka safe to run on the open internet requires defending against attacks on availability (DDOS) and other things. This hasn't really been thought through. Our assumption is that the goal is to protect your data against attack but not to defend against availability attacks like creating too many connections or things like that.

Authentication

We are tentatively planning to use SASL for kerberos.

SASL does not actually transmit the bits required for authentication. To handle this we will need to add a new AuthRequest/AuthResponse API to our protocol. This will contain only a simple byte array containing the auth stuff SASL needs.

Generally you would expect authentication to happen at connection time, but I don't think there is really any reason to require this. I think instead we can allow it at any time during a session or even multiple times during a session (if the client wishes to change their user a la su). However this is fundamentally attached to the connection, so if the client reconnects they will lose their authentication and need to re-authenticate.

All connections that have not yet been authenticated will be assigned a fake user ("nobody" or "josephk" or something).

For TLS we would need a separate TLS port. Presumably this would need to be maintained in the cluster metadata so clients can choose to connect to the appropriate port.

Regardless of the mechanism by which you connect and authenticate, the mechanism by which we check your permissions should be the same.

Implementing Authentication Request

This feature requires some co-operation between the socket server and the api layer. The API layer will handle the authenticate request, but the username will be associated with the connection. One approach to implementing this would be to add the concept of a Session object that is maintained with the connection and contains the username. The session would be stored in the context for the socket in socket server and destroyed as part of socket close. The session would be passed down to the API layer with each request and we would have something like session.authenticatedAs() to get the username to use for authorization purposes.

Integrity and Encryption

SASL supports authentication alone, authentication + integrity protection (signing), and authentication + integrity & encryption. I think TLS has a similar set of options.

Integrity protection and encryption require actually translating the bytes being transmitted on a per client basis. These will not work with the sendfile optimization and we would have to disable this optimization for requests with this more stringent security requirement.

Authorization

The plan will be to support unix-like permissions on a per-topic level.

Authorization will be done in the "business logic" layer in Kafka (aka KafkaApis). The API can be something like

PermissionManager.isPermitted(Subject subject, Permissions permission, String resource)

For example doing a produce request you would likely check something like the following:

PermissionManager.isPermitted(session.subject(), Permissions.WRITE, topicName)

This check will obviously have to be quite quick as it will be done on every request so the necessary metadata will need to be cached.

The subject is basically the "user name" or identify of the person trying to take some action. This will be established via whatever authentication mechanism. The action is basically a list of things you may be permitted to do (e.g. read, write, etc).

The resource will generally be based on the topic name but there could be other resources we want to secure so we can just treat it as an arbitrary string.

I could imagine the following permissions:

READ - Permission to fetch data from the topic
WRITE - Permission to publish data to the topic
DELETE - Permission to delete the topic
CREATE - Permission to create the topic
CONFIGURE - Permission to change the configuration for the topic
DESCRIBE - Permission to fetch metadata on the topic
REPLICATE - Permission to participate as a replica (i.e. issue a fetch request with a non-negative node id). This is different from READ in that it has implications for when a write request is committed.

Permission are not hierarchical since topics are not hierarchical. So a user will have a default value for these (a kind of umask) as well as a potential override on a per-topic basis. Note that CREATE and DESCRIBE permission primarily makes sense at the default level.

We will maintain permissions for each topic in a manner similar to the handling of configs. We will have a zookeeper directory
/permissions/defaults
which contains the default permissions as well as
/permissions/topics
which will have per-topic permission settings.

Note that the PermissionManager api deals with whatever notion of groups or acls internally. So if via some group mechanism we have assigned the READ permission to an entire group we still do the check at the user level and internally this api needs to resolve the permission at the group level.

Open Questions

On-the-wire encryption: do we need to do this? If so we will have to disable the sendfile optimization when encryption is used.
Groups vs ACLs: need to understand pros and cons.
Can we do everything over a single port?

Space shortcuts

Child pages