Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current Kafka log compaction is based on server side view, which means records are compacted only based on records offset. For the same key, only the highest offset record is reserved kept after compaction. Note that records are appended to logs based on the order when message is received on broker for the same topic partition. This default strategy insufficient in many scenarios. On the producer side, when multiple producers produce to the same topic partition, the producer side record order cannot be guaranteed on the server side. This is because the message transmitting over the network is determined by many factors out of the control of Kafka. On the server side, which message is reserved after compaction is random. The following is a motivating example:

...

Producer 1 tries to send a message <K1, V1> to topic A partition p1. Producer 2 tries to send a message to send a message <K1, V2> to topic A partition p1. On the producer side, we clearly preserve an order for these the two message, <K1, V1> <K1, V2>. On But on the server side, this order can be random, meaning, message <K1, V1> could have a high offset due to the fact this message is received later than <K1, V2>. When compaction happens, <K1, V1> will be kept, and clearly this is not what is intended.


To resolve the above issue, we propose to add a feature to support two compaction strategies, default and header sequence. The default compaction will be kept as is. The header sequence strategy will support compaction on producer side message sequence. The proposed configuration is per topic, meaning a user can choose to enable a different compaction strategy for a subset of compact topics. While this proposal only supports two compaction strategies, it leaves open the option to add more compaction strategy in future. 


Acknowledgement: we thank the previous author of this KIP proposal,  Luís Cabral, many of his changes are kept in this proposal. 


In order to use Kafka as the message broker within an Event Source architecture, it becomes essential that Kafka is able to reconstruct the current state of the events in a "most recent snapshot" approach.

...