You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Status

Current state"Under Discussion"

Discussion thread: here

JIRA: KAFKA-7632

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

This proposal suggests adding compression level option to the producer config.

Basically, CPU (running time) and I/O (compressed size) are trade-offs in compression. Since the best is use case dependent, lots of compression algorithms provide a way to control the compression level with reasonable default level, which results in a good performance in general.

However, Kafka does not provide a way to configure the compression level. Although it shows good performance with default compression level, there are some cases which don't fit. For example, zstd supports a wide range of compression levels to reach the Pareto frontier, which means it decompresses faster than any other algorithm with similar or better compression ratio. In other words, disallowing users to adjust compression level means abandoning much of potential of zstd. In fact, Kafka's go client (sarama) already supports compression level feature for gzip and zstd.

Public Interfaces

This feature introduces a new available option 'compression.level' to the producer configuration. Whether the compression level is available, the range of valid level, and the default level is up to the compression codec. That is:

Compression CodecLevel availabilityMinimum LevelMaximum LevelDefault Level
gzipYesDeflater.BEST_SPEED (1)Deflater.BEST_COMPRESSION (9)Deflater.DEFAULT_COMPRESSION (-1)
snappyNo---
lz4Yes474
zstdYesvaries with versionvaries with version3

For example,

  • Level 3 is allowed for both gzip and zstd but not lz4, since it allows 4 ~ 7 only.
  • Level 10 is allowed for zstd but not gzip, since it allows 1 ~ 9 only.
  • Configuring compression level for snappy will throw an error, since snappy does not support compression level.

Proposed Changes

The given record batches will be compressed with the specified level; if not specified, it will be compressed with default level.

Compatibility, Deprecation, and Migration Plan

Since Kafka is compressing with default level currently, no backward compatibility problem.

Rejected Alternatives

Can we support compression level to zstd only?

Not practical. Since supporting compression needs modification of CompressionType enum class, abandoning some codec does not reduce the required work; Moreover, this decision is so unfair for the users who are running old topics compressed with gzip or lz4.

Can we support universal 'default compression level' value for producer config?

Impossible. Currently, most compression codecs allow to adjust the compression level with int type value, and it seems like it never changes. However, not all of these codecs support a value to denote 'default compression level' or the assigned value to default level differs; For example, gzip uses '-1' for default level but zstd used 0 for default level; Since the latest release of zstd allows negative compression level, the meaning of 0 level is also changing.

For these reasons, we can't provide a universal int value to denote default compression level.

Can we use external dictionary feature?

This feature requires an option to specify the dictionary for the supported codec, e.g., snappy, lz4, and zstd. It obviously over the scope of this KIP.

  • No labels