Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Integrate with the system test framework. We have a good integration test, it would be nice to hook that in to the nightly test run.
  2. Add yammer metrics for cleaner rates
  3. Handle messages that exceed the read/write buffer size.
  4. Check for null keys in KafkaApis.handleProduce when appending to a log with dedupe as the cleanup policy.

Add a tool to measure the duplication in a log

It would be nice to have an operational tool to check the duplication within a log. This could be built as a simple consumer that takes in a particular topic/partition and consumes that log sequentially and estimate the duplication. Each key consumed would be checked against a bloom filter. If it is present we would count a duplicate, otherwise we would add it to the filter. A large enough bloom filter could probably produce an accurate-enough estimate of duplication rate.

Improve dedupe buffer efficiency

Currently we use a fairly naive approach to the approximate deduplication. There are several things that could be improved.

  1. Currently we will only process up to N messages of dirty log in one cleaning where N=buffer_size/24*collision_rate. This may actually be a little bit conservative. The dirty section of the log may itself has many duplicates, in which case it is actually using up much less space in the dedupe buffer. We could check whether the key is present in the dedupe buffer and only increment the entry count if there is nothing there.
  2. We currently use neither chaining nor probing in the dedupe map check. We could actually get better use of our memory by implementing some kind of bounded probing (linear or otherwise) to look a little harder for an empty spot. This would give us better memory density at the cost of slightly more expensive misses when doing a lookup (since you have to probe).

Drive-aware scheduling and throttling

Currently we have a global throttle in place on I/O and we use only the ratio of dirty to clean bytes to choose the log to clean. This will work well for a single drive or multiple drives in a RAID configuration or if you have only one cleaner thread. However if you have a JBOD configuration with multiple data directories AND are using multiple cleaner threads this is not ideal.

The problem is that you could end up scheduling many concurrent cleanings for logs which both reside on the same drive. Setting the throttling conservative enough to handle this could result in over-throttling in the case where you are cleaning logs on different drives.

A simple fix would be to have the throttling be per-drive. However this is not ideal either since you would still schedule concurrent cleanings on a single drive which might result in no more cleaning then having only a single thread (because they would all be throttled).

A more sophisticated scheduling approach would be aware of the per-disk throttle rate and choose logs appropriately.

Estimate Survivorship Ratio

...