Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • By default "max.compaction.lag.ms" is set to 0 and this time-based log compaction policy is disabled.  There are no compatibility issues and no migration is required.   

Performance impact

  • Kafka already collects compaction metrics (CleanerStats) that include how many bytes that are read/written during each compaction run and how long does it take to compact a log partition. Those metrics can be used to measure the performance impact when adapting this KIP.  For example, if most log partitions get compacted each day without time-based compaction,  setting the log compaction time interval to more than one day should have little impact on the amount of resource spent on compaction.

Rejected Alternatives

  • One way to force compaction on any cleanable log segment is setting “min.cleanable.dirty.ratio” to 0. However, compacting a log partition whenever a segment become cleanable (controlled by "min.compaction.lag.ms") is very expensive.  We still want to accumulate some amount of log segments before compaction is kicked out.

  • If compaction and time based retention are both enabled on a topic, the compaction might prevent records from being deleted on time.  The reason is when compacting multiple segments into one single segment, the newly created segment will have same lastmodified timestamp as latest original segment. We lose the timestamp of all original segments except the last one. As a result, records might not be deleted as it should be through time based retention.  We decide not to address this issue in this KIP because  we don't have obvious use cases that users must enable both time based retention and log compaction. Addressing this issue can be kept as a future work.  One solution is during log compaction, looking into record timestamp to delete expired records. This can be done in compaction logic itself or use AdminClient.deleteRecords() . But this solution assumes we have record timestamp.  Further investigation is needed if we have to deal with on-time retention on log compacted topic. 

...