The following is a proposal for securing Kafka. We don't know this area that well, so this document is as much notes and exploration of the space as a spec.
Requirements
- Need to support both Kerberos and TLS (SSL)
- Need to support unix-like users and permissions
- Need some kind of group or ACL notion
- No backwards-incompatible release
Assumptions
- On-disk encryption is not needed
- We can secure Zookeeper
- Not trying to run Kafka on the open internet
Authentication
We are tentatively planning to use SASL.
SASL does not actually transmit the bits required for authentication. To handle this we will need to add a new AuthRequest/AuthResponse API to our protocol. This will contain only a simple byte array containing the auth stuff SASL needs.
Generally you would expect authentication to happen at connection time, but I don't think there is really any reason to require this. I think instead we can allow it at any time during a session or even multiple times during a session (if the client wishes to change their user a la su). However this is fundamentally attached to the connection, so if the client reconnects they will lose their authentication and need to re-authenticate.
All connections that have not yet been authenticated will be assigned a fake user ("nobody" or "josephk" or something).
Implementing Authentication
This feature requires some co-operation between the socket server and the api layer. The API layer will handle the authenticate request, but the username will be associated with the connection. One approach to implementing this would be to add the concept of a Session object that is maintained with the connection and contains the username. The session would be stored in the context for the socket in socket server and destroyed as part of socket close. The session would be passed down to the API layer with each request and we would have something like session.authenticatedAs() to get the username to use for authorization purposes.
Integrity and Encryption
SASL supports authentication alone, authentication + integrity protection (signing), and authentication + integrity & encryption.
Integrity protection and encryption require actually wrapping and unwrapping the bytes being transmitted. These will not work with the sendfile optimization and we would have to disable this optimization for requests with this more stringent security requirement. Presumably this would apply to the contents of sends but not to the size boundaries we use to delimit them (i.e. for encrypted messages the size would be the size post encryption and the size itself would not be encrypted).
Authorization
The plan will be to support unix-like permissions on a per-topic level.
An important question is whether to support traditional groups or more flexible acls. I don't understand this yet.
We will have the following permissions:
READ - Permission to fetch data from the topic
WRITE - Permission to publish data to the topic
DELETE - Permission to delete the topic
CREATE - Permission to create the topic
CONFIGURE - Permission to change the configuration for the topic
DESCRIBE - Permission to fetch metadata on the topic
REPLICATE - Permission to participate as a replica (i.e. issue a fetch request with a non-negative node id). This is different from READ in that it has implications for when a write request is committed.
Permission are not hierarchical since topics are not hierarchical. So a user will have a default value for these (a kind of umask) as well as a potential override on a per-topic basis. Note that CREATE and DESCRIBE permission primarily makes sense at the default level.
We will maintain permissions for each topic in a manner similar to the handling of configs. We will have a zookeeper directory
/permissions/defaults
which contains the default permissions as well as
/permissions/topics
which will have per-topic permission settings.
For each secured action on the server, the server will do a permissions check like
permissionsManager.ensurePermitted(request.user, Permissions.Read, "mytopicname")
Obviously this check will have to be fast since it will be done at least once on most requests, so the necessary permissions will need to be cached locally.
Open Questions
- On-the-wire encryption: do we need to do this? If so we will have to disable the sendfile optimization when encryption is used.
- Groups vs ACLs: need to understand pros and cons.
- Can we do everything over a single port?
Notes