Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

The following is a list of potential improvements to the log cleaner.

Simple Things

Integrate with

...

system test framework

...

We have a good integration test, it would be nice to hook that in to the nightly test run.

Add

...

Add a tool to measure the duplication in a log

It would be nice to have an operational tool to check the duplication within a log. This could be built as a simple consumer that takes in a particular topic/partition and consumes that log sequentially and estimate the duplication. Each key consumed would be checked against a bloom filter. If it is present we would count a duplicate, otherwise we would add it to the filter. A large enough bloom filter could probably produce an accurate-enough estimate of duplication rate.

KAFKA-1336

Improve dedupe buffer efficiency

...