Table of Contents

Status

Current state: DiscussionAccepted (vote)

Discussion thread: https://lists.apache.org/thread.html/67fcfe37169bdbabdbecc30686ccba0f5f27e193c468a1fe5d0062ed@%3Cdev.kafka.apache.org%3E

Old Discussion thread: https://lists.apache.org/thread.html/f44317eb6cd34f91966654c80509d4a457dbbccdd02b86645782be67@%3Cdev.kafka.apache.org%3E

JIRA: https://issues.apache.org/jira/browse/

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	KAFKA-7061

PULL REQUEST: https://github.com/apache/kafka/pull/75288103

Motivation

Current log compaction is based on the server side view i.e. compacted based on record offset and the offset is by the order when the record was received on the broker side. So for the same key, only the highest offset record is kept after compaction so that Kafka is able to reconstruct the current state of the events in a "most recent snapshot" approach. The issue then occurs when the insertion order is not guaranteed, which causes the log compaction to keep the wrong state. This can be easily replicated when using a multi-threaded (or simply multiple) producer(s), or when sending the events asynchronously. The following is an example:

...

Update producer to send the header value in all record.
Roll out the producer first to all clusters.
Once all producers sending the header value confirmed, update the topic config on the broker side with the header strategy.
Note:
- any existing topic migration, the already compacted log still remains as such (i.e. offset based) and only the new logs will get compacted using the new strategy once the topic config updated with the strategy.
- in any case if topic strategy needs roll back to default offset strategy, first topic config on the broker side should be updated to offset strategy and then producer can stop generating the header value.

Recommendations

For scenarios like the low produce rate, the topic partition remaining ineligible for compaction for an unbounded duration where by "delete.retention.ms" triggers that removes the tombstone record if exist. In that case we recommend the Kafka users to have "segment.ms" & "max.compaction.lag.ms" (as compaction won't happen on active segment) to be smaller than the "delete.retention.ms".
As this KIP is introducing configurable compaction strategy, the Consumer should be aware and follow the same compact strategy as in broker to avoid inconsistency on what records to keep.

Rejected Alternatives

(This section remains the same as previous proposal.)

...

Space shortcuts

Child pages

Versions Compared

Old Version 50

New Version Current

Key

Status

Motivation

Recommendations

Rejected Alternatives

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 50

New Version Current

Key

Status

Motivation

Recommendations

Rejected Alternatives