So far, we have no way to know the log recovery progress. All we can do is check the broker log and know it is busy on doing recovery. In this KIP, we're going to expose a RemainingLogsToRecovery metric for each log.dir and RemainingSegmentsToRecovery metric for each recovery thread, to allow the admin have a way to monitor the progress of log recovery.

Public Interfaces

RemainingLogsToRecovery metric will be added into "kafka.log" → LogManager for each log.dir.

RemainingSegmentsToRecovery metric will be added into "kafka.log" → LogManager for each recovery thread.

Full Name

Type

Description

kafka.log:type=LogManager,name=remainingLogsToRecovery

32-bit gauge

The remaining logs number for each log.dir to be recovered

kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+)

note: the The dir output format is the valid directory path string for each OS(and valid for JAVA). Here Since the rule is different from each OS, here is just a simple example format.

32-bit gauge

The remaining segments to be recovered in each recovery thread (i.e. in each replica log).

...

1. RemainingLogsToRecovery: It's to show the remaining logs number for each log.dir to be recovered. The total number of logs to be recovered will be summed in step (1.b) described in "motivation" section. When each log completes the recovery for all the segments under the log, the RemainingLogsToRecovery will be decremented, and in the end, it'll be 0. When broker is not under log recovery state, the number metric will always be 0removed.

2. RemainingSegmentsToRecovery: It's to show the remaining segments to be recovered in each recovery thread (i.e. in each replica log). The total number of segments to be recovered will be calculated in step (1.b.ii) described in "motivation" section. When each segment completes the recovery, the RemainingSegmentsToRecovery will be decremented, and in the end, it'll be 0. When broker is not under log recovery state, the number metric will always be 0removed.

For example:

configs:

log.dirs=/tmp/log1,tmp/log2
num.recovery.threads.per.data.dir=2

...

kafka.log
- LogManager
  - RemainingLogsToRecover
    - /tmp/log1 => 5 ← there are 5 logs under /tmp/log1 needed to be recovered
    - /tmp/log2 => 0
  - RemainingSegmentsToRecover
    - /tmp/log1 ← 2 threads are doing log recovery for /tmp/log1
      - 0 => 10000 1000 ← there are 10000 1000 segments needed to be recovered for thread 0
      - 1 => 3
    - /tmp/log2
      - 0 => 0
      - 1 => 0

It showed, currently, there are still 5 logs (partitions) needed to recover under /tmp/log1 dir. And there are 2 threads doing the jobs, where one thread has 10000 1000 segments needed to recover, and the other one has 3 segments needed to recover.

...

kafka.log
- LogManager
  - RemainingLogsToRecover
    - /tmp/log1 => 3 ← there are 3 logs under /tmp/log1 needed to be recovered
    - /tmp/log2 => 0
  - RemainingSegmentsToRecover
    - /tmp/log1 ← 2 threads are doing log recovery for /tmp/log1
      - 0 => 9000 300 ← there are 9000 300 segments needed to be recovered for thread 0
      - 1 => 5
    - /tmp/log2
      - 0 => 0
      - 1 => 0

After log recovery completes, the RemainingLogsToRecover and RemainingSegmentsToRecover metrics will be removed.

Compatibility, Deprecation, and Migration Plan

...

Space shortcuts

Child pages

Versions Compared

Old Version 18

New Version 19

Key

Public Interfaces

Compatibility, Deprecation, and Migration Plan

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 18

New Version 19

Key

Public Interfaces

Compatibility, Deprecation, and Migration Plan