Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Kafka already collects compaction metrics (CleanerStats) that include how many bytes that are read/written during each compaction run and how long does it take to compact a log partition. Those metrics can be used to measure the performance impact when adapting this KIP.  For exampleIn addition, if most log partitions get compacted each a log partition already gets compacted once per day before this KIP,  setting the log compaction time interval to more than one day should have little impact on the amount of resource spent on compaction since the existing log compaction configuration (e.g., min dirty ratio) will trigger log compaction before "max.compaction.lag.ms". 

Rejected Alternatives

  • One way to force compaction on any cleanable log segment is setting “min.cleanable.dirty.ratio” to 0. However, compacting a log partition whenever a segment become cleanable (controlled by "min.compaction.lag.ms") is very expensive.  We still want to accumulate some amount of log segments before compaction is kicked out.  In addition, in order to honor the max compaction lag requirement, we also need to force a roll on active segment if the required lag has passed. So the existing configuration doesn't meet requirements to ensure a maximum compaction lag.  

  • If compaction and time based retention are both enabled on a topic, the compaction might prevent records from being deleted on time.  The reason is when compacting multiple segments into one single segment, the newly created segment will have same lastmodified timestamp as latest original segment. We lose the timestamp of all original segments except the last one. As a result, records might not be deleted as it should be through time based retention.  We decide not to address this issue in this KIP because 1) in practice it is very unlikely that compaction will keep preventing retention from deleting a specific record indefinitely.  2) we don't have obvious use cases that users must enable both time based retention and log compaction. Addressing this issue can be kept as a future work.  One solution is during log compaction, looking into record timestamp to delete expired records. This can be done in compaction logic itself or use AdminClient.deleteRecords() . But this solution assumes we have record timestamp.  Further investigation is needed if we have to deal with on-time retention on log compacted topic. 

...