Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. log.message.format.version & message.format.version are deprecated, a warning is issued if the value is lower than 0.11.0 and it is always assumed to be 3.0 if the inter.broker.protocol.version is 3.0 or higher (see below for the implications).
  2. New config log.record.version.force.upgrade is introduced with two possible values: null (default) and 2. If the value is set to 2, the broker will ensure all segments have records with format v2 after log recovery has completed during start-up. This can be extended to support newer message format versions if they are introduced.

...

Proposed Changes

Apache Kafka 3.0

Since the config Once a broker is upgraded to 3.0 and the inter.broker.protocol.version is updated to 3.0, message.format.version is always version is assumed to be 3.0, and we will always write records with message format v2 when. This includes the following scenarios:

  1. Persisting produce records on disk
  2. Writing new segments as part of log compaction
  3. Persisting group data (consumer offsets and group metadata)
  4. Followers writing replicated data to disk
  5. Writing new segments as part of log compaction

1 and 2 are straightforward since that's the current behavior when the message format is explicitly set to 3.0, but 3 and 4 introduce new behavior. There are two main goals: ensure correctness and opportunistically convert to the new format. We will discuss subtle correctness considerations later in the document. will write the records they receive from leaders without conversion (as they currently do). Since leaders will use v2 for new requests, replicating old data is the only case where v0 or v1 records may be written to disk after the upgrade to Apache Kafka 3.0.

Produce and fetch requests with v0 and v1 message formats would be supported via up-conversion and down-conversion. Up-conversion and (especially) down-conversion have measurable performance impact due to increased CPU and memory usage, but the vast majority of Kafka clients have supported v2 for some time (even Spark, a notable late adopter, has supported v2 since Spark 2.4, which was released in October 2018).

For Kafka clusters where topics were configured with message format v0 or v1 at some point, we need a mechanism to ensure records are converted to the new version before the upgrade to Apache Kafka 4.0. We propose the introduction of the log.record.version.force.upgrade config for this purpose. The conversion would happen during start-up using a similar approach as log recovery after unclean shutdown.

There are subtle correctness considerations when converting records that had been previously written with an old format:

  1. Replication is aligned by batches, but up-conversion will happen independently on each broker for cases 3 and 4. For compressed records, v0 and v1 message formats support batching, so it's straightforward. Uncompressed records, however, are not batched in v0 and v1. To ensure alignment, we will convert them to single record batches in v2. It's worth noting that message format v2 is slightly less efficient than v0 and v1 when single record batches are used, but it's an acceptable cost for correctness and it only impacts older records. Over time, these records will either be deleted via retention or will be replaced by new versions for compacted topics.
  2. We need to make sure that once we write a record batch with a new format, we never write a record batch with an older format. Is this really true?

Apache Kafka 4.0

We will remove all support for message formats v0 and v1 in Apache Kafka 4.0.

...