Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.

 Status

Current state:Under Discussion

Discussion thread: here todo

JIRA: KAFKA-3224here todo

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

One of Kafka's officially-described use cases is a distributed commit log (http://kafka.apache.org/documentation.html#uses_commitlog). In this case, for a distributed service that needed a commit log, there would be a topic with a single partition to guarantee log order. This service would use the commit log to re-sync failed nodes. Kafka is generally an excellent fit for such a system, but it does not expose an adequate mechanism for log cleanup in such a case. With a distributed commit log, data can only be deleted when the client application determines that it is no longer needed; this creates completely arbitrary ranges of time and size for messages, which the existing cleanup mechanisms can't handle smoothly.

A new addition to the existing deletion policy based on the absolute timestamp of a message would work perfectly for this case.  The client application will periodically update the minimum timestamp of messages to retain, and Kafka will delete all messages earlier than that timestamp using the existing log cleaner thread mechanismdeletion mechanism, alongside the existing size-based and duration-based checks.

This is based off of the work being done in for KIP-32 - Add timestamps to Kafka message. and KIP-33 - Add a time based log index.

Public Interfaces

This KIP has the following public interface changes:

  • Expose a new topic configuration, log.retention.mintimestamp.min.timestamp.  The value will be a Unix time in milliseconds.

 

Proposed Changes

  • Add a new topic configuration, log.retention.mintimestamp.min.timestamp.
    • The format of the value will be a Unix time in milliseconds.
  • Modify the log deletion mechanism (in LogManager.scala) to also delete segments whose last timestamp is before the configured timestamp if the timestamp is set

      ...

      • Timestamp-based deletion will work with both CreateTime and LogAppendTime timestamp types.

      Compatibility, Deprecation, and Migration Plan

      ...