Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here are a few main driving motivations for this proposal:

  1. Kafka should be agnostic to the message format: Joe Stein has proposed supporting general schema-based topics at Kafka, and at LinkedIn this idea has been applied for a while. We use Avro as our centralized schema management which controls the data format contract between clients, and in addition we have built our auditing system also based on the some pre-defined fields of the schemas (KAFKA-260). However, this approach forces Kafka to be aware of our Avro message formats for its auditing purposes beyond just piping bytes, which adds extra maintenance overhead (Todd has some more explanation in the discussion thread).
     Kafka should try to avoid any unnecessary de-compression / re-compression in its pipeline: today broker needs to de-/re-compress messages just for assigning their offsets upon receiving compressed messages (KAFKA-527); and MM need to always do decompression at the consumer side and then re-compress the messages at the producer side in case there are keyed messages (KAFKA-1001). This can lead to high CPU / memory usage and also risk of data loss due to too-large-message after re-compression in the middle of the pipeline. Both of these de- / re-compression process should be able to avoid in most cases.
     
  2. Kafka should support auditing while preserving its agnosticity to the message format: Joe Stein has proposed supporting general schema-based topics at Kafka, and at LinkedIn this idea has been applied for a while. We use Avro as our centralized schema management which controls the data format contract between clients, and in addition we have built our auditing system also based on the some pre-defined fields of the schemas (KAFKA-260). However, this approach forces Kafka to be aware of our Avro message formats for its auditing purposes beyond just piping bytes, which adds extra maintenance overhead (Todd has some more explanation in the discussion thread).
     
  3. Kafka need to support control messages (KAFKA-1639): we want to add "control message" into Kafka, which is NOT real data but only used by the broker / clients for core Kafka functionality usage such as transactional messaging, etc.

 

We have been also discussing about some other Kafka problems:

...