...
So far, we have no way to know the log recovery progress. All we can do is check the broker log and know it is busy on doing recovery. In this KIP, we're going to expose a RemainingLogsToRecovery
metric for each log.dir and RemainingSegmentsToRecovery
metric for each recovery thread, to allow the admin have a way to monitor the progress of log recovery.
Public Interfaces
RemainingLogsToRecovery
metric will be added into "kafka.log" → LogManager for each log.dir.
RemainingSegmentsToRecovery
metric will be added into "kafka.log" → LogManager for each recovery thread.
Full Name | Type | Description |
---|---|---|
kafka.log:type=LogManager,name=remainingLogsToRecovery | 32-bit gauge | The remaining logs number for each log.dir to be recovered |
kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+) note: the The dir output format is the valid directory path string for each OS(and valid for JAVA). Here Since the rule is different from each OS, here is just a simple example format. | 32-bit gauge | The remaining segments to be recovered in each recovery thread (i.e. in each replica log). |
...
1. RemainingLogsToRecovery
: It's to show the remaining logs number for each log.dir to be recovered. The total number of logs to be recovered will be summed in step (1.b) described in "motivation" section. When each log completes the recovery for all the segments under the log, the RemainingLogsToRecovery
will be decremented, and in the end, it'll be 0. When broker is not under log recovery state, the number metric will always be 0removed.
2. RemainingSegmentsToRecovery
: It's to show the remaining segments to be recovered in each recovery thread (i.e. in each replica log). The total number of segments to be recovered will be calculated in step (1.b.ii) described in "motivation" section. When each segment completes the recovery, the RemainingSegmentsToRecovery
will be decremented, and in the end, it'll be 0. When broker is not under log recovery state, the number metric will always be 0removed.
For example:
configs:
- log.dirs=/tmp/log1,tmp/log2
- num.recovery.threads.per.data.dir=2
...
- kafka.log
- LogManager
- RemainingLogsToRecover
- /tmp/log1 => 5 ← there are 5 logs under /tmp/log1 needed to be recovered
- /tmp/log2 => 0
- /tmp/log1 => 5 ← there are 5 logs under /tmp/log1 needed to be recovered
- RemainingSegmentsToRecover
- /tmp/log1 ← 2 threads are doing log recovery for /tmp/log1
- 0 => 10000 1000 ← there are 10000 1000 segments needed to be recovered for thread 0
- 1 => 3
- /tmp/log2
- 0 => 0
- 1 => 0
- /tmp/log1 ← 2 threads are doing log recovery for /tmp/log1
- RemainingLogsToRecover
- LogManager
It showed, currently, there are still 5 logs (partitions) needed to recover under /tmp/log1 dir. And there are 2 threads doing the jobs, where one thread has 10000 1000 segments needed to recover, and the other one has 3 segments needed to recover.
...
- kafka.log
- LogManager
- RemainingLogsToRecover
- /tmp/log1 => 3 ← there are 3 logs under /tmp/log1 needed to be recovered
- /tmp/log2 => 0
- /tmp/log1 => 3 ← there are 3 logs under /tmp/log1 needed to be recovered
- RemainingSegmentsToRecover
- /tmp/log1 ← 2 threads are doing log recovery for /tmp/log1
- 0 => 9000 300 ← there are 9000 300 segments needed to be recovered for thread 0
- 1 => 5
- /tmp/log2
- 0 => 0
- 1 => 0
- /tmp/log1 ← 2 threads are doing log recovery for /tmp/log1
- RemainingLogsToRecover
- LogManager
After log recovery completes, the RemainingLogsToRecover
and RemainingSegmentsToRecover
metrics will be removed.
Compatibility, Deprecation, and Migration Plan
...