Then again, these improvements still require manual intervention or at the very least complex infrastructure code that automates the process.
It would be very useful if Kafka had a way to quarantine unexpected failures in certain logs such that they don't affect the cleaning of other logs. While this would not fix the issue, it would significantly slow down the process and provide users with adequate time for detection and repair.

Public Interfaces

New metricmetrics:

`uncleanable-partitions-count` (Int) - Count of partitions that are uncleanable per logDir
`uncleanable-bytes` (Long) - The current number of uncleanable bytes. This is the sum of uncleanable bytes for every uncleanable partition

New broker config value:

`max`log.cleaner.max.uncleanable.partitions` bytes` - the maximum amount of uncleanable partitions megabytes a single LogDir can have before it is marked as offline. Default value is set to 1010GB (value of 10000000000)

Proposed Changes

Catch any unexpected (non-IO) exceptions in `CleanerThread#cleanOrSleep()`.

...

When evaluating which logs to compact, skip the marked as uncleanable ones.

Introduce new broker cluster-level configurable value - `max`log.cleaner.max.uncleanable.partitions`bytes`. When the sum of uncleanable bytes for all marked partitions reach reaches this threshold, mark the disk they disk are on as offline. (this most likely indicates a problem with the disk itself)

...

Space shortcuts

Child pages

Versions Compared

Old Version 13

New Version 14

Key

Public Interfaces

Proposed Changes

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 13

New Version 14

Key

Public Interfaces

Proposed Changes