You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Status

Current state: Under Discussion

Discussion thread: here

JIRA: here

Pull Request: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Kafka Streams treats repartition topics differently to regular topics. Instead of setting arbitrary retention criteria and having the broker cleanup old records, Kafka Streams sets infinite retention on repartition topics and explicitly deletes records once they've been committed to the next topic in their Topology. Currently, this is done every time the Task is committed, resulting in explicit "delete records" requests being sent every commit.interval.ms milliseconds.

When commit.interval.ms is set very low, for example when processing.guarantee is set to exactly_once_v2, this causes delete records requests to be sent extremely frequently, potentially reducing throughput and causing a high volume of log messages to be logged by the brokers.

Public Interfaces

New configuration options

NameTypeImportanceDefaultDescription
min.repartition.purge.interval.msLongLOW30000The minimum frequency in milliseconds with which to delete fully consumed records from repartition topics. Purging will occur after at least this value since the last purge, but may be delayed until later in order to meet the processing guarantee. The default value is the same as the default for commit.interval.ms (30000).  (Note, unlike commit.interval.ms, the default for this value remains unchanged when processing.guarantee is set to exactly_once_v2).

Proposed Changes

Adding a new configuration option, min.repartition.purge.interval.ms, that configures the frequency these explicit record deletions are sent will resolve the issue, by enabling users to tune the commit.interval.ms and min.repartition.purge.interval.ms separately.

Compatibility, Deprecation, and Migration Plan

  • The interval between explicit delete requests for repartition records will no longer be coupled to commit.interval.ms. Default behaviour is unchanged, however:
    • When commit.interval.ms  is explicitly modified by the user, old repartition records will no longer be deleted on every commit.
    • When processing.guarantee is set to exactly_once_v2, since the default commit.interval.ms is changed internally to 100 ms, old repartition records will no longer be deleted on every commit.
    • Users can regain this coupling by explicitly configuring both commit.interval.ms and min.repartition.purge.interval.ms to the same value.

Rejected Alternatives

Modifying the explicit deletion of records to be completely independent of commits such that min.repartition.purge.interval.ms is strictly adhered to, irrespective of the value of commit.interval.ms was not explored, as the increased complexity of the changes may introduce bugs, with little additional benefit.

  • No labels