Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Expand the log cleaner example a little

...

The diagram below presents a brief overview of how compaction works in Kafka today. The cleaner maintains an offset known as the "first dirty offset." On a given round of cleaning, the cleaner will scan forward starting from the dirty offset and build a table of the key-value pairs until either the end of the log first uncleanable offset is reached or the table grows too large. The end offset of this scanning becomes the next dirty offset once this round of cleaning completes. After building this table, the cleaner scans from the beginning of the log and builds a new log which consists of all the entries which are not present in the table. Once the new log is ready, it is atomically swapped with the current log.

As an example, consider the following log. The first dirty offset is 9. Suppose that the cleaner is able to scan to the end of the log .when building the table of retained entries.

Image Added

Following cleaning, the first dirty offset is advanced to offset 9. We are able to remove the entries at offsets 0, 1, 4, and 5.

Image AddedImage Removed

In order to address the "consistent versioning" problem mentioned above, an observer needs to be able to tell when it has reached an offset such that the materialized snapshot at that offset is guaranteed to be consistent among all replicas of the log. The challenge is that an observer which is fetching the log from the leader does not know which portion of the log has already been cleaned. For example, using the diagram above, if we attempt to materialize the state after only reading up to offset 56, then our snapshot will not contain keys k3 k1 and k4 even k2 even though they may would have been present in the original log at offsetsat offset 6 if the entries at offset 0 and 1 had not been cleaned.

Our solution to address this problem is simple. We require the leader to indicate its current dirty offset in each FetchQuorumRecords response. An A follower/observer will know if its current snapshot represents a consistent version if and only if its local log end offset after appending the records from the response is greater than or equal to the dirty offset received from the leader.

...