...
This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.
Status
Current state: "Under Discussiondiscarded"
Discussion thread: here
JIRA: here
Note: discarding this KIP since the benefits of the space savings outweigh the cost incurred for message conversion. And there are also planned changes for version v.3, like:
3. v3 of the record format should make it cheaper to make changes in the
future (perhaps it could support tagged fields or similar)
4. We'd want to fix other known issues at the same time (eg log append time
should always be available, there may be others)
5. We should consider whether we would want to introduce a user header that
is at the batch level vs record level for efficiency reasons
[Additional proposal] Changing the batch format proposal
Problem: Currently, we perform validation of data schema (such as verifying monotonically increasing offsets etc.). To do that, we have to read the headers of each message in a batch. For a compressed batch, this means that we have to decompress the entire batch including the payload (key/value) even though we don't really require any information out of them.
Solution: If we change the ordering of messages in a batch to prefix with headers of all messages first and followed by the payload of all messages, it will lead to an optimization where we have to "partially" decompress the batch (i.e. the prefix).
Motivation
Logs in kafka consists of batches, and one batch consists of many messages. So if the size of each message header can be reduced, it'll improve the network traffic and storage size (and money, of course) a lot, even it's a small improvement. Some companies now handle trillions of messages per day using Kafka, so, if we can reduce just 1 byte per message, the save is really considerable.
...