You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Status

Current state"Under Discussion"

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

JIRA: here [Change the link from KAFKA-1 to your own ticket]

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Message format v2 was introduced in Apache Kafka 0.11.0 (released in June 2017) via KIP-98 and has been the default since. It includes a number of enhancements (partition leader epoch, sequence ids, producer ids, record headers) required for correctness (KIP-101, KIP-279KIP-320), stronger semantics (idempotent producers, transactional clients) and other features (KIP-82 - Add Record Headers, KIP-392: Allow consumers to fetch from closest replica).

Four years later, it's time to sunset message formats v0 and v1 to establish a new baseline in terms of supported behavior, to improve maintainability and supportability. This also aligns with KIP-679, which will enable the idempotent producer by default in Apache Kafka 3.0 (and requires message format v2). We propose the deprecation of message formats v0 and v1 in Apache Kafka 3.0 and their removal in Apache Kafka 4.0.

Public Interfaces

Apache Kafka 3.0

  1. log.message.format.version & message.format.version are deprecated, a warning is issued if the value is lower than 0.11.0 and it is always assumed to be 3.0 (see below for the implications).
  2. New config log.record.version.force.upgrade is introduced with two possible values: null (default) and 2. If the value is set to 2, the broker will ensure all segments have records with format v2 after log recovery has completed during start-up. This can be extended to support newer message format versions if they are introduced.

Apache Kafka 4.0

  1. log.message.format.version & message.format.version are removed. They won't serve any purpose and the fact that the allowable values are Kafka versions instead of message format versions has been a source of confusion. If we introduce new message format versions, they should use actual message format versions.
  2. Produce requests won't support up-conversion from message formats v0 and v1.
  3. Fetch requests won't support down-conversion to message formats v0 and v1.

Proposed Changes

Apache Kafka 3.0

Since message.format.version is always assumed to be 3.0, message format version v2 will always be used when:

  1. Persisting produce records on disk
  2. Writing new segments as part of log compaction
  3. Persisting group data (consumer offsets and group metadata)

Followers will write the records they receive from leaders without conversion (as they currently do). Since leaders will use v2 for new requests, replicating old data is the only case where v0 or v1 records may be written to disk after the upgrade to Apache Kafka 3.0.

Produce and fetch requests with v0 and v1 message formats would be supported via up-conversion and down-conversion. Up-conversion and (especially) down-conversion have measurable performance impact due to the increased CPU and memory usage, but the vast majority of Kafka clients have supported v2 for some time (even Spark, a notable late adopter, has supported v2 since Spark 2.4, which was released in October 2018).

In order to remove saf The log.record.version.force.upgrade config 

Apache Kafka 4.0

  1. Remove support for produce/consume with older message formats in AK 4.0. This would align well if we make 0.11 (or newer) the minimum client supported version, but it can also be done independently.

  2. Maintain support for reading all message format versions when it comes to persisted data for the foreseeable future. We don't have a good mechanism to ensure data with the old formats doesn't exist on disk.

  3. Note that Produce v3 and higher require message format v2. Fetch v4 or higher are required for message format v2.

Compatibility, Deprecation, and Migration Plan

Apache Kafka 3.0

As described above, produce requests from producers with no v2 message format support will require up-conversion while fetch requests from consumers with no v2 message support will require down-conversion. To avoid the negative performance impact, we recommend upgrading to newer versions (anything released in the last 2 years should be fine, although some clients may require configuration not to use ancient protocol versions).

Apache Kafka 4.0

TBD

Rejected Alternatives

  1. Maintain support for message formats 0 and 1. Message format 2 is required for correctness (KIP-101) and key features like idempotence and transactions (KIP-98).
  2. Keep read-only support for message formats 0 and 1 when it comes to on-disk data to avoid forced conversion to message format 2 in Apache Kafka 3.x. Even though this is appealing, it would mean keeping all the code for handling the older message formats for a long time. The benefit doesn't seem worth the cost.
  • No labels