...
So far, we have no way to know the log recovery progress. All we can do is check the broker log and know it is busy on doing recovery. In this KIP, we're going to expose a RemainingLogsToRecovery
metric for each log.dir and RemainingSegmentsToRecovery
metric for each recovery thread, to allow the admin have a way to monitor the progress of log recovery.
...
RemainingLogsToRecovery
metric will be added into "kafka.log" → LogManager for each log.dir.
RemainingSegmentsToRecovery
metric will be added into "kafka.log" → LogManager for each recovery thread.
Proposed Changes
The proposal is to propose 2 metrics:
1. a RemainingLogsToRecovery
metric : It's to show the remaining logs number for each log.dir to be recovered. The total number of logs to be recovered will be summed in step (1.b) described in "motivation" section. When each log completes the recovery for all the segments under the log, the RemainingLogsToRecovery
will be decremented, and in the end, it'll be 0. When broker is not under log recovery state, the number will always be 0.
2. RemainingSegmentsToRecovery
: It's to show the remaining segments to be recovered in each recovery thread (i.e. in each replica log). The total number of segments to be recovered will be calculated in step (1.b.ii) described in "motivation" section. When each segment completes the recovery, the RemainingSegmentsToRecovery
will be decremented, and in the end, it'll be 0. When broker is not under log recovery state, the number will always be 0.
For example:
log.dirs=/tmp/log1,tmp/log2
num.recovery.threads.per.data.dir=2
In the jmx, we'll see
- kafka.log
- LogManager
- RemainingLogsToRecover
- /tmp/log1 => 5 ← there are 5 logs under /tmp/log1 needed to be recovered
- /tmp/log2 => 0
- /tmp/log1 => 5 ← there are 5 logs under /tmp/log1 needed to be recovered
- RemainingSegmentsToRecover
- /tmp/log1 ← 2 threads are doing log recovery for /tmp/log1
- 0 => 1000 ← there are 1000 segments needed to be recovered for thread 0
- 1 => 10
- /tmp/log2
- 0 => 0
- 1 => 0
- /tmp/log1 ← 2 threads are doing log recovery for /tmp/log1
- RemainingLogsToRecover
- LogManager
Compatibility, Deprecation, and Migration Plan
...