Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

 

Status

Current state: Under Discussion Accepted

Discussion thread: here

JIRA: here

Github PRPR 1212

Released: <TBD / Goal 0.10.0.0>0

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Changes to public interfaces:

org.apache.kafka.common.record.KafkaLZ4BlockInputStream

  • add checkHC boolean

org.apache.kafka.common.record.KafkaLZ4BlockOutputStream

...

None. Interface changes are for classes not currently marked as public in javadoc.

...

Proposed Changes

Old 0.8.2/0.9 clients (current behavior):

  • produce messages w/ broken HC checksum only
  • consume messages w/ incorrect HC checksum only

...

New 0.10 clients (proposed behavior)

  • produce all messages (v0 and consume v1 ) messages w/ correct LZ4F checksum
    do not verify HC checksum when consuming (it is optional in spec, and kafka has a message-level checksum already)

New 0.10 broker (proposed behavior):

  • do not validate checksum for v0 produced messages
  • Return v0 (KIP-31) messages w/ old "broken" checksum in FetchResponse using
    • v0 on-disk stored with incorrect checksum, returned directly
    • use KIP-31 conversion to "break" checksum when stored as v1 on-disk with correct checksum
  • Return v1 (KIP-31) messages w/ correct checksum in FetchResponse, using
    • v1 on-disk format stored with correct checksum
    • use KIP-31 conversion
    for
    • to "fix" checksum when stored as v0 on-disk
    format, and
    • with incorrect checksum
  • reject require all v1 produced messages that do not have correct checksum, otherwise return error code 2 (InvalidMessage) in ProduceResponse
  • disable checksum validation for v0 produced messages
  • all LZ4 errors return code 2 in ProduceResponse (Invalid/Corrupt Message)

KafkaLZ4* code:

  • fix checksum calculation for both compress and decompress
  • add option to compression class to allow writing incorrect checksum for compatibility
  • add option to decompression class to allow ignoring incorrect checksum for compatibility
  • do not reject messages that have optional lz4 header flags set: ContentSize or ContentChecksum. This is add naive support for optional header flags (ContentSize, ContentChecksum) to enable interoperability with off-the-shelf lz4 libraries the that may set them. The only flag left unsupported is dependent-block compression (LZ4 Stream API), which our implementation does not currently support.

...

Compatibility, Deprecation, and Migration Plan

  • Compatibility with old clients is maintained by switching the LZ4 framing checksum behavior alongside with the v0 / v1 message format. This allows old clients to continue producing and consuming LZ4 messages in the old format, and enables new clients to produce and consume messages in the new format. It also leverages the KIP-31 v1<->v0 "conversion" process to reencode LZ4 messages as required to either fix (v0->v1) or break (v1->v0) the checksum.
  • The one use case that may need special attention is trunk users who use LZ4 compressed messages AND have already upgraded to v1 messages. Are there any such users? For these users, upgrading brokers would cause their prior v1 producers to immediately fail because broker will reject v1 messages with broken checksum. Upgrading producers will fix and enable production to continue. If any such users exist, it might be worth providing a one-time patch to apply to trunk that disables the broker error on v1 messages with broken HC. Alternately we could allow this to be configured, but I think we should be skeptical of this as it will likely create more confusion for the vast majority of users who are not in this edge-case. There have been no reports of this usage on the kafka mailing lists during KIP discussion.
  • Note that because the broker would no longer validate HC checksums on v0 messages, it will be possible for (non-java) clients to produce LZ4 messages in v0 format using a correct checksum. The broker will alter the checksum so that it appears "broken" for compatibility with older clients during KIP-31 conversion. Clients wishing to consume v0 format LZ4-messages must either ignore the HC checksum locally, or implement the broken HC checksum logic (also apply checksum to 4-byte magic header)

...