Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This proposal includes new configurations for controlling compaction. The log cleaner can be configured retain a minimum amount of the uncompacted head of the log. This is enabled by setting one or more of the compaction lagslag:

    log.cleaner.min.compaction.lag.ms
for setting minimum  log.cleaner.min.compaction.lag.bytes
    log.cleaner.min.compaction.lag.messages

...

message age in milliseconds. This has a similar per-topic configuration:
    min.compaction.lag.ms
    min.compaction.lag.bytes
    min.compaction.lag.messages

...

This lag configuration defaults to zero so that if not set, all log segments are eligible for compaction except for the last segment (i.e. the one currently being written). The active segment will not be compacted even if all of the compaction

...

 lag constraint is satisfied. This leaves unchanged the current behavior. If

...

 the compaction

...

 lag is greater than zero then compaction of the segments of the logs containing any messages that do not satisfy

...

 the lag constraint will not be compacted. In particular this allows for the example use case like: "any consumer that is no more than 1 hour behind will get every message."

Proposed Changes

Introduce an additional configurations configuration to topics that guarantee a minimum portion of the head of the log will remain uncompacted. That is, offer guarantees a guarantee that a consumer that does not lag too far behind will get every update to a compacted topic. These This can be used to set constraints on the minimum _distance_ from the topic head that will remain uncompacted, where distance is defined in terms of :

...

time since insertion (i.e. message age)

...

.

The basic behavior of the compaction ratio to trigger and prioritize compaction order will not be altered. However, the ratio's definition will be expanded to become the ratio of "compactable" to compactable plus compacted message sizes. Where compactable includes log segments that are neither the active segment nor those prohibited from being compacted because they contain messages that do not satisfy all the new lag constraintslag constraint.

The time lag guarantee can be satisfied by preventing compaction of any segment containing a message or messages within the time lag. KIP-33 - Add a time based log index provides a mechanism for this to accurately computed for a log segment.

Compatibility, Deprecation, and Migration Plan

...

The database replication use case could be satisfied using a combination of "snapshot" and "journal" topics for each table. The journal topics could use regular time-based deletion. There would need to be some external process periodically creating new snapshots from the most recent snapshots and the journals.

While it would be straightforward to introduce other types of "lag" (e.g. aggregate message size, or message count) there were not sufficient motivating use cases to justify their inclusion at this time.