Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: pulled in initial thoughts from mailing list, removed assumptions

...

The following is a proposal for securing Kafka. We don't know this area that well, so this document is as much notes and exploration of the space as a specApache Kafka.   The Kafka security principals should be based on the confidentiality, integrity and availability of the data that Kafka is responsible for.  Availability has already been addressed from the 0.8.0 release with replication and continue features though current stable. This document is focused on the confidentiality and integrity of the data. A primary driver for these changes are based on the varied regulatory and compliance requirements that exist.  Open source systems have a unique ability to provide best of breed solutions in this area because the requirements, development, testing and production use come from organizations in different verticals with different business goals.  We welcome continued contribution in all areas of this work from the community.

Requirements

  • Need to support both Kerberos and TLS (SSL)
  • Need to support unix-like users and permissions
  • Need some kind of group or ACL notion
  • No backwards-incompatible release
  • Encryption at rest supporting compliance regulations
  • Non-repudiation and long term non-repudiation for data integrity

...

We think all this can probably be done in a backwards compatible manner and without significant performance degradation for non-secure users.

...

  • On-disk encryption is not needed 
  • We can secure Zookeeper
  • Not trying to run Kafka on the open internet

We assume on-disk encryption is not needed as even many databases do not provide this.

We will need to secure zookeeper if we use it for storing any permissions information.

Making Kafka safe to run on the open internet requires defending against attacks on availability (DDOS) and other things. This hasn't really been thought through. Our assumption is that the goal is to protect your data against attack but not to defend against availability attacks like creating too many connections or things like that.

Authentication

We are tentatively planning to use SASL for kerberos.

...

This feature requires some co-operation between the socket server and the api layer. The API layer will handle the authenticate request, but the username will be associated with the connection. One approach to implementing this would be to add the concept of a Session object that is maintained with the connection and contains the username. The session would be stored in the context for the socket in socket server and destroyed as part of socket close. The session would be passed down to the API layer with each request and we would have something like session.authenticatedAs() to get the username to use for authorization purposes.

...

SASL supports authentication alone, authentication + integrity protection (signing), and authentication + integrity & encryption. I think TLS has a similar set of options.

Integrity protection and encryption require actually translating the bytes being transmitted on a per client basis. These will not work with the sendfile optimization and we would have to disable this optimization for requests with this more stringent security requirement.

Authorization

The plan will be to support unix-like permissions on a per-topic level.

...

Note that the PermissionManager api deals with whatever notion of groups or acls internally. So if via some group mechanism we have assigned the READ permission to an entire group we still do the check at the user level and internally this api needs to resolve the permission at the group level.

Encryption

This is very important and something that can be facilitated within the wire protocol. It requires an additional map data structure for the "encrypted [data encryption key]". With this map (either in your object or in the wire protocol) you can store the dynamically generated symmetric key (for each message) and then encrypt the data using that dynamically generated key.  You then encrypt the encryption key using each public key for whom is expected to be able to decrypt the encryption key to then decrypt the message.  For each public key encrypted symmetric key (which is now the "encrypted [data encryption key]" along with which public key it was encrypted with for (so a map of [publicKey] = encryptedDataEncryptionKey) as a chain.   Other patterns can be implemented but this is a pretty standard digital enveloping [0] pattern with only 1 field added. Other patterns should be able to use that field to-do their implementation too.

Non-repudiation and long term non-repudiation


Non-repudiation is proving data hasn't changed.  This is often (if not always) done with x509 public certificates (chained to a certificate authority).  
Long term non-repudiation is what happens when the certificates of the certificate authority are expired (or revoked) and everything ever signed (ever) with that certificate's public key then becomes "no longer provable as ever being authentic".  That is where RFC3126 [1] and RFC3161 [2] come in (or worm drives [hardware], etc).
For either (or both) of these it is an operation of the encryptor to sign/hash the data (with or without third party trusted timestap of the signing event) and encrypt that with their own private key and distribute the results (before and after encrypting if required) along with their public key. This structure is a bit more complex but feasible, it is a map of digital signature formats and the chain of dig sig attestations.  The map's key being the method (i.e. CRC32, PKCS7 [3], XmlDigSig [4]) and then a list of map where that key is "purpose" of signature (what your attesting too).  As a sibling field to the list another field for "the attester" as bytes (e.g. their PKCS12 [5] for the map of PKCS7 signatures).



Open Questions

  • On-the-wire encryption: do we need to do this? If so we will have to disable the sendfile optimization when encryption is used.
  • Groups vs ACLs: need to understand pros and cons.
  • Can we do everything over a single port?