Status
Current state: "Under Discussion"
Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]
JIRA: here [Change the link from KAFKA-1 to your own ticket]
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Message format v2 was introduced in Apache Kafka 0.11.0 (released in June 2017) via KIP-98 and has been the default since. It includes a number of enhancements (partition leader epoch, sequence ids, producer ids, record headers) required for correctness (KIP-101, KIP-279, KIP-320), stronger semantics (idempotent producers, transactional clients) and other features (KIP-82 - Add Record Headers, KIP-392: Allow consumers to fetch from closest replica).
Four years later, it's time to sunset message formats v0 and v1 to establish a new baseline in terms of supported broker behavior and to simplify the codebase (with all the benefits that brings). This also aligns with KIP-679, which will enable the idempotent producer by default in Apache Kafka 3.0 (and requires message format v2). We propose the deprecation of message formats v0 and v1 in Apache Kafka 3.0 and their removal in Apache Kafka 4.0.
Public Interfaces
Apache Kafka 3.0
log.message.format.version
&message.format.version
are deprecated, a warning is issued if the value is lower than0.11.0
and it is always assumed to be3.0
(see below for the implications).- New config
log.record.version.force.upgrade
is introduced with two possible values:null
(default) and 2. If the value is set to2
, the broker will ensure all segments have records with formatv2
after log recovery has completed during start-up. This can be extended to support newer message format versions if they are introduced.
Apache Kafka 4.0
log.message.format.version & message.format.version
are removed. They won't serve any purpose and the fact that the allowable values are Kafka versions instead of message format versions has been a source of confusion. If we introduce new message format versions, they should use actual message format versions.- Produce requests won't support up-conversion from message formats v0 and v1.
- Fetch requests won't support down-conversion to message formats v0 and v1.
Proposed Changes
Apache Kafka 3.0
Since message.format.version
is always assumed to be 3.0
, message format version v2 will always be used when:
- Persisting produce records on disk
- Writing new segments as part of log compaction
- Persisting group data (consumer offsets and group metadata)
Followers will write the records they receive from leaders without conversion (as they currently do). Since leaders will use v2
for new requests, replicating old data is the only case where v0 or v1 records may be written to disk after the upgrade to Apache Kafka 3.0.
Produce and fetch requests with v0 and v1 message formats would be supported via up-conversion and down-conversion. Up-conversion and (especially) down-conversion have measurable performance impact due to the increased CPU and memory usage, but the vast majority of clients have supported v2 for some time (even Spark, a notable late adopter, has supported it since Spark 2.4, which was released in October 2018).
The log.record.version.force.upgrade
config TBD
Apache Kafka 4.0
TBD
Compatibility, Deprecation, and Migration Plan
Apache Kafka 3.0
As described above, produce requests from producers with no v2 message format support will require up-conversion while fetch requests from consumers with no v2 message support will require down-conversion. To avoid the negative performance impact, we recommend upgrading to newer versions (anything released in the last 2 years should be fine, although some clients may require configuration not to use ancient protocol versions).
Apache Kafka 4.0
TBD
Rejected Alternatives
- Maintain support for message formats 0 and 1. Message format 2 is required for correctness (KIP-101) and key features like idempotence and transactions (KIP-98).
- Keep read-only support for message formats 0 and 1 when it comes to on-disk data to avoid forced conversion to message format 2 in Apache Kafka 3.x. Even though this is appealing, it would mean keeping all the code for handling the older message formats for a long time. The benefit doesn't seem worth the cost.