Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • `uncleanable-partitions-count` (Int) - Count of partitions that are uncleanable
  • `uncleanable-partitions` (String) - Comma-separated names of the partitions that are uncleanable. Example: "2,3,4"

New configurable value:

  • `max.uncleanable.partitions` (better name?) - the maximum amount of uncleanable partitions a single disk volume can have before it is marked as offline

Proposed Changes

Catch any unexpected exceptions in `CleanerThread#cleanOrSleep()`.

...

When evaluating which logs to compact, skip the marked ones.

Introduce new broker configurable value - `max.uncleanable.partitions`. When the marked partitions reach this threshold, mark the disk they are on as offline. (this most likely indicates a problem with the disk itself)

Needs Discussion

  • A metric that tracks the overall uncleanable bytes seems like it would be useful. I am not sure how easy that is to implement and I wonder if that functionality (fetching log segments and determining their size) could cause additional errors
  • Should said log directories be marked as "offline log directories" therefore stopping replicas from fetching said partitions?
  • Should we mark disk partitions as offline after a certain number of `IOException`s are caught? (as they imply that something might be wrong with the disk)

...