Status

Current state: Discussion

Discussion thread: https://lists.apache.org/thread.html/79aa6e50d7c737ddf83455dd8063692a535a1afa558620fe1a1496d3@<dev.kafka.apache.org>

JIRA: https://issues.apache.org/jira/browse/KAFKA-7061

PULL REQUEST: https://github.com/apache/kafka/pull/5149 (WIP)

Motivation

Current Kafka log compaction is based on server side view, which means records are compacted only based on records offset. For the same key, only the highest offset record is reserved after compaction. Note that records are appended to logs based on the order when message is received on broker for the same topic partition. This default strategy insufficient in many scenarios. On the producer side, when multiple producers produce to the same topic partition, the producer side record order cannot be guaranteed on the server side. This is because the message transmitting over the network is determined by many factors out of the control of Kafka. On the server side, which message is reserved after compaction is random. The following is a motivating example:

Producer 1 tries to send a message <K1, V1> to topic A partition p1. Producer 2 tries to send a message to send a message <K1, V2> to topic A partition p1. On the producer side, we clearly preserve an order for these two message, <K1, V1> <K1, V2>. On the server side,

In order to use Kafka as the message broker within an Event Source architecture, it becomes essential that Kafka is able to reconstruct the current state of the events in a "most recent snapshot" approach.

This is where log compaction becomes an integral part of the workflow, as only the latest state is of interest. At the moment, Kafka accomplishes this by considering the insertion order (or highest offset) as a representation of the latest state.

The issue then occurs when the insertion order is not guaranteed, which causes the log compaction to keep the wrong state. This can be easily replicated when using a multi-threaded (or simply multiple) producer(s), or when sending the events asynchronously.

Public Interfaces

There are no changes to the public interfaces.

Proposed Changes

Enhance log compaction to support more than just offset comparison, so the insertion order isn't always dictating which records to keep (in effect, allowing for a form of OCC);
The current behavior should remain as the default in order to minimize impact on already existing clients and avoid any migration efforts;
New Configurations:
- "log.cleaner.compaction.strategy"
  - The active compaction strategy to use;
  - Accepts values "offset", "timestamp" and "header", allowing for further strategies to be added in the future as needed;
- "log.cleaner.compaction.strategy.header"
  - Configuration sub-set to use when the strategy is set to "header";
Compaction Strategies:
- "offset"
  - The previous behavior is active, compacting the logs purely based on offset;
  - Also used when the configuration is either empty or not present, making this the default strategy;
- "timestamp"
  - The record timestamp will be used to determine which record to keep, in a 'keep-highest' approach;
  - When both records being compared contain an equal timestamp, then the record with the highest offset will be kept;
  - This requires caching also the timestamp field during compaction, in addition to the base offset, so each record being compacted will suffer a memory increase from 8 bytes to 16 bytes when using this strategy.
- "header"
  - Searches the record for a header key that matches the configured value on "compaction.strategy.header";
  - If the "compaction.strategy.header" configuration is not set (or is blank), then the compaction strategy will fallback to "offset";
  - If a header key that matches the configuration exists, then the header value (which must be of type "long") will be used to determine which record to keep, in a 'keep-highest' approach;
  - If both records being compared do not have a matching header key, then the record with the highest offset will be kept;
  - If both records being compared contain an equal header value, then the record with the highest offset will be kept;
  - If only one of the records being compared has a matching header, then this record is kept, as the other record is considered to be anomalous;
  - This requires caching also the header field during compaction, in addition to the base offset, so each record being compacted will suffer a memory increase from 8 bytes to 16 bytes when using this strategy.

Compatibility, Deprecation, and Migration Plan

Following the proposed changes, there are no compatibility issues and no migration is required.

Rejected Alternatives

Stream the data out of Kafka and perform Event Sourcing there
- This would mean creating an in-house solution, which makes Kafka irrelevant in the design, and so its best left as a last-approach in case no solution is found on Kafka-side
Guarantee insertion order on the producer
- Not viable as keeping this logic synchronized greatly reduces the event throughput
Check the version before sending the event to Kafka
- Similar to the previous point, though it adds extra complexity as race-conditions may arise when attempting to compare
Caching the record version as a byte array and perform the comparisons between records using a lexicographic byte array comparator
- This adds greater flexibility on the client side, but allowing a variable byte array size to be used raises concerns about memory usage by the cache
Always search the headers for a key matching whatever is configured, so if a header "timestamp" exists then it could be used by the compaction mechanism
- This introduces backwards compatibility issues, as any headers are allowed without this change and the compaction is not affected at all.
- Even if ignoring the previous point, this may cause API issues as, for example, the topic may be designed with "offset" compaction, which makes it unclear if the Producer should then provide a header "offset" or if the internal offset is meant to be used.
Provide the configuration for the individual topics
- None of the configurations for log compaction are available at topic level, so adding it there is not a part of this KIP

Space shortcuts

Child pages

Status

Motivation

Public Interfaces

Proposed Changes

Compatibility, Deprecation, and Migration Plan

Rejected Alternatives

Space shortcuts

Child pages

KIP-280: Enhanced log compaction

Status

Motivation

Public Interfaces

Proposed Changes

Compatibility, Deprecation, and Migration Plan

Rejected Alternatives