Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state[Under Discussion]

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Kafka has a strict message size limit on the messages. This size limit is applied to the compressed messages as well.

...

This is especially an issue for services like MirrorMaker whose producer is shared by many different topics.

Public Interfaces

We want to introduce a new configuration enable.comrpession.ratio.estimation to allow the users to opt out the compression ratio estimation, but use the uncompressed size for batching directly.

Proposed Changes

We want to introduce a new configuration enable.comrpession.ratio.estimation to allow the users to opt out the compression ratio estimation, but use the uncompressed size for batching directly.

...

This approach would guarantee that the compressed message size will be less than the max message size. And as long as the batch size is set to a reasonable number, the compression ratio is unlikey to be hurt.

Compatibility, Deprecation, and Migration Plan

The KIP only inroduce a new configuraton. The change is completely backwards compatible.

Rejected Alternatives

Decompress the batch which encounters a RecordTooLargeException, split it into two and send it again

The issue of this approach has some caveats:

  1. More overhead introduced on the producer side. The producer have to decompress the batch, regroup the messages and resend them. If the producer keeps the original uncompressed message to avoid potential decompression, it will have huge memory overhead.
  2. The split batches is not guaranteed to be smaller than the max size limit. Potentially there will multiple retries until the messages get through, or fail.
  3. In the scenario such as mirror maker, due to different compression ratio in different topics, some of the topics may have very different compression ratio from the average compression ratio, this will potentially introduce many split and resend, which introduces a lot of overhead.

Keep per topic compression ratio

This approach may solve the problem introduced by the difference of compression ratios among different topics. But the downside is that it does not handle the compression ratio difference within a topic.

...