Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

So far, we have no way to know the log recovery progress. All we can do is check the broker log and know it is busy on doing recovery. In this KIP, we're going to expose a RemainingLogsToRecovery  metric for each log.dir and RemainingSegmentsToRecovery  metric for each recovery thread, to allow the admin have a way to monitor the progress of log recovery.

Public Interfaces

RemainingLogsToRecovery  metric will be added into "kafka.log" → LogManager for each log.dir.

RemainingSegmentsToRecovery metric will be added into "kafka.log" → LogManager for each recovery thread.


Full NameTypeDescription
kafka.log:type=LogManager,name=remainingLogsToRecovery32-bit gaugeThe remaining logs number for each log.dir to be recovered

kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+)

note: the The dir output format is the valid directory path string for each OS(and valid for JAVA). Here Since the rule is different from each OS, here is just a simple example format.

32-bit gaugeThe remaining segments to be recovered in each recovery thread (i.e. in each replica log).

...

1.  RemainingLogsToRecovery: It's to show the remaining logs number for each log.dir to be recovered. The total number of logs to be recovered will be summed in step (1.b) described in "motivation" section. When each log completes the recovery for all the segments under the log, the RemainingLogsToRecovery will be decremented, and in the end, it'll be 0. When broker is not under log recovery state, the number metric will always be 0removed.

2. RemainingSegmentsToRecovery: It's to show the remaining segments to be recovered in each recovery thread (i.e. in each replica log). The total number of segments to be recovered will be calculated in step (1.b.ii) described in "motivation" section. When each segment completes the recovery, the RemainingSegmentsToRecovery will be decremented, and in the end, it'll be 0. When broker is not under log recovery state, the number metric will always be 0removed.

For example:

configs:

  • log.dirs=/tmp/log1,tmp/log2
  • num.recovery.threads.per.data.dir=2

...

  • kafka.log
    • LogManager
      • RemainingLogsToRecover 
        • /tmp/log1 => 5            ← there are 5 logs under /tmp/log1 needed to be recovered
        • /tmp/log2 => 0
      • RemainingSegmentsToRecover
        • /tmp/log1                       ← 2 threads are doing log recovery for /tmp/log1
          • 0 => 10000 1000           ← there are 10000 1000 segments needed to be recovered for thread 0
          • 1 => 3
        • /tmp/log2
          • 0 => 0
          • 1 => 0

It showed, currently, there are still 5 logs (partitions) needed to recover under /tmp/log1 dir. And there are 2 threads doing the jobs, where one thread has 10000 1000 segments needed to recover, and the other one has 3 segments needed to recover.

...

  • kafka.log
    • LogManager
      • RemainingLogsToRecover 
        • /tmp/log1 => 3            ← there are 3 logs under /tmp/log1 needed to be recovered
        • /tmp/log2 => 0
      • RemainingSegmentsToRecover
        • /tmp/log1                     ← 2 threads are doing log recovery for /tmp/log1
          • 0 => 9000 300             ← there are 9000 300 segments needed to be recovered for thread 0
          • 1 => 5
        • /tmp/log2
          • 0 => 0
          • 1 => 0


After log recovery completes, the RemainingLogsToRecover and RemainingSegmentsToRecover metrics will be removed.

Compatibility, Deprecation, and Migration Plan

...