You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Status

Current state"Under Discussion"

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

JIRA: here [Change the link from KAFKA-1 to your own ticket]

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Message format v2 was introduced in Apache Kafka 0.11.0 (released in June 2017) via KIP-98 and has been the default since. It includes a number of enhancements (partition leader epoch, sequence ids, producer ids, record headers) required for correctness (KIP-101, KIP-279KIP-320), stronger semantics (idempotent producers, transactional clients) and other features (KIP-82 - Add Record Headers, KIP-392: Allow consumers to fetch from closest replica).

Four years later, it's time to sunset message formats v0 and v1 to establish a new baseline in terms of supported client/broker behavior and to improve maintainability & supportability of Kafka. This also aligns with KIP-679, which will enable the idempotent producer by default in Apache Kafka 3.0 (and requires message format v2). We propose the deprecation of message formats v0 and v1 in Apache Kafka 3.0 and their removal in Apache Kafka 4.0.

Public Interfaces

Apache Kafka 3.0

  1. log.message.format.version & message.format.version are deprecated, a warning is issued if the value is lower than 0.11.0 and it is always assumed to be 3.0 (see below for the implications).
  2. New config log.record.version.force.upgrade is introduced with two possible values: null (default) and 2. If the value is set to 2, the broker will ensure all segments have records with format v2 after log recovery has completed during start-up. This can be extended to support newer message format versions if they are introduced.

Apache Kafka 4.0

  1. log.message.format.version & message.format.version are removed. They won't serve any purpose and the fact that the allowable values are Kafka versions instead of message format versions has been a source of confusion. If we introduce new message format versions, they should use actual message format versions.
  2. Produce requests won't support up-conversion from message formats v0 and v1.
  3. Fetch requests won't support down-conversion to message formats v0 and v1.

Proposed Changes

Apache Kafka 3.0

Since message.format.version is always assumed to be 3.0, message format version v2 will always be used when:

  1. Persisting produce records on disk
  2. Writing new segments as part of log compaction
  3. Persisting group data (consumer offsets and group metadata)

Followers will write the records they receive from leaders without conversion (as they currently do). Since leaders will use v2 for new requests, replicating old data is the only case where v0 or v1 records may be written to disk after the upgrade to Apache Kafka 3.0.

Produce and fetch requests with v0 and v1 message formats would be supported via up-conversion and down-conversion. Up-conversion and (especially) down-conversion have measurable performance impact due to the increased CPU and memory usage, but the vast majority of Kafka clients have supported v2 for some time (even Spark, a notable late adopter, has supported v2 since Spark 2.4, which was released in October 2018).

For Kafka clusters where topics were configured with message format v0 or v1 at some point, we need a mechanism to ensure records are converted to the new version before the upgrade to Apache Kafka 4.0. We propose the introduction of the log.record.version.force.upgrade config for this purpose. The conversion would happen during start-up using a similar approach as log recovery after unclean shutdown.

Apache Kafka 4.0

As described above, we will remove all support for message formats v0 and v1 in Apache Kafka 4.0.

Compatibility, Deprecation, and Migration Plan

Apache Kafka 3.0

As described above, produce requests from producers with no v2 message format support will require up-conversion while fetch requests from consumers with no v2 message support will require down-conversion. To avoid the negative performance impact, we recommend upgrading to newer versions (anything released in the last 2 years should be fine, although some clients may require configuration not to use ancient protocol versions).

Apache Kafka 4.0

Clients with no support for message format v2 will not be supported. In the rare cases where such clients are still used, they will have to be upgraded. Fetch request v4 or higher is required for message format v2, so fetch v3 and older would no longer be supported by the broker or Java consumer.

Rejected Alternatives

  1. Maintain support for message formats 0 and 1. Message format 2 is required for correctness (KIP-101) and key features like idempotence and transactions (KIP-98).
  2. Keep read-only support for message formats 0 and 1 when it comes to on-disk data to avoid forced conversion to message format 2 in Apache Kafka 3.x. Even though this is appealing, it would mean keeping all the code for handling the older message formats for a long time. The benefit doesn't seem worth the cost.
  • No labels