Status
Current state: Under Discussion
Discussion thread: <TBD>
JIRA: here
Released: <TBD>
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Kafka's initial LZ4 compression implementation is not interoperable. It does not follow the standard LZ4 framing specification (see https://cyan4973.github.io/lz4/lz4_Frame_format.html). This makes it difficult for third-party clients to support LZ4 compression using off-the-shelf libraries. This KIP proposes to fix kafka's LZ4 handling so that it is conformant with the LZ4F specification and enable clients to interoperate with respect to LZ4-compressed messages.
Specifically, KAFKA-1493 attempted to implement the LZ4F interoperable framing specification. There's a bug, however, that causes the frame checksum to be incorrectly calculated. Fixing this single byte (refered to as HC) is the goal of this KIP.
Public Interfaces
Changes to public interfaces:
org.apache.kafka.common.record.KafkaLZ4BlockInputStream
- add checkHC boolean
org.apache.kafka.common.record.KafkaLZ4BlockOutputStream
- add useBrokenHC boolean
- remove isPresetDictionarySet() method (this was unused and has been deprecated in the latest LZ4F spec)
Proposed Changes
Old 0.8/0.9 clients (current behavior):
- produce messages w/ broken checksum
- consume messages w/ incorrect checksum only
New 0.10 clients (proposed behavior)
- produce all messages w/ correct LZ4F checksum
- do not verify HC checksum when consuming (it is optional in spec, and kafka has a message-level checksum already)
Proposed behavior for 0.10 broker:
- Return v0 (KIP-31) messages w/ old "broken" checksum in FetchResponse
- Return v1 (KIP-31) messages w/ correct checksum in FetchResponse
- do not validate checksum for v0 produced messages
- require all v1 produced messages to have correct checksum, otherwise throw error
Compatibility, Deprecation, and Migration Plan
- Compatibility with old clients is maintained by switching
- The one use case that may need special attention is trunk users who use LZ4 compressed messages AND have already upgraded to v1 messages. Are there any such users? For these users, upgrading brokers would cause their prior v1 producers to immediately fail because broker will reject v1 messages with broken checksum. Upgrading producers will fix and enable production to continue. If any such users exist, it might be worth providing a one-time patch to apply to trunk that disables the broker error on v1 messages with broken HC. Alternately we could allow this to be configured, but I think we should be skeptical of this as it will likely create more confusion for the vast majority of users who are not in this edge-case.
- Note that because the broker would no longer validate HC checksums on v0 messages, it will be possible for (non-java) clients to produce LZ4 messages in v0 format using a correct checksum. The broker will alter the checksum so that it appears "broken" for compatibility with older clients.
Rejected Alternatives
Alternative #1: Create a new compression type, "LZ4F" . Rejected because this is really just a bugfix, not a new compression type. The number of compression types is limited by the number of bits available in message attribute byte. We currently use 2 bits to cover the 4 compression types (None, Gzip, Snappy, LZ4). Adding a second type for a "fixed" LZ4 would require pulling a 3rd bit from attributes bytes. Further, explaining to users the difference between LZ4 and LZ4F compression types is likely to be difficult.