Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

So far, we have no way to know the log recovery progress. All we can do is check the broker log and know it is busy on doing recovery. In this KIP, we're going to expose a RemainingLogsToRecoveryemainingLogsToRecover  metric for each log.dir and RemainingSegmentsToRecovery RemainingSegmentsToRecover  metric for each recovery thread, to allow the admin have a way to monitor the progress of log recovery.

Public Interfaces


Full NameTypeDescription
kafka.log:type=LogManager,name=
remainingLogsToRecovery
remainingLogsToRecover32-bit gaugeThe remaining logs number for each log.dir to be recovered

kafka.log:type=LogManager,name=

remainingSegmentsToRecovery

remainingSegmentsToRecover,dir=([-._\/\w\d\s]+),threadNum=([0-9]+)

note: The dir format is the valid directory path string for OS(and valid for JAVA). Since the rule is different from each OS, here is just a simple example format.

32-bit gaugeThe remaining segments in current log to be recovered in each recovery thread
(i.e
.
in each replica log).


Proposed Changes

The proposal is to propose 2 metrics:

1.  RemainingLogsToRecoveryemainingLogsToRecover: It's to show the remaining logs number for each log.dir to be recovered. The total number of logs to be recovered will be summed in step (1.b) described in "motivation" section. When each log completes the recovery for all the segments under the log, the RemainingLogsToRecovery emainingLogsToRecover will be decremented, and in the end, it'll be 0. When broker is not under log recovery state, the metric will be removed.

2. RemainingSegmentsToRecovery RemainingSegmentsToRecover: It's to show the remaining segments to be recovered in each recovery thread (i.e. in each replica log). The total number of segments to be recovered will be calculated in step (1.b.ii) described in "motivation" section. When each segment completes the recovery, the RemainingSegmentsToRecovery RemainingSegmentsToRecover will be decremented, and in the end, it'll be 0. When broker is not under log recovery state, the metric will be removed.

...

This is not conflicted with the KIP, but finding the log recovery progress inside the broker logs is not easy for admins. Actually, during the implementation, we'll also improve the log output to have much clear info for log recovery progress. On the other hands, having the metrics is still a better way to monitor the log recovery progress for admins.


2. Provide a RemainingBytesToRecovery RemainingBytesToRecover metric:

Currently, when log manager start up, we'll try to load all logs (segments), and during the log loading, we'll try to recover logs if necessary.
And the logs loading is using "thread pool" as you thought.

So, here's the problem:
All segments in each log folder (partition) will be loaded in each log recovery thread, and until it's loaded, we can know how many segments (or how many Bytes) needed to recover.

That means, if we have 10 partition logs in one broker, and we have 2 log recovery threads (num.recovery.threads.per.data.dir=2), before the threads load the segments in each log, we only know how many logs (partitions) we have in the broker (i.e. RemainingLogsToRecover metric). We cannot know how many segments/Bytes needed to recover until each thread starts to load the segments under one log (partition).

That said, the `RemainingBytesToRecovery` `RemainingBytesToRecover` metric is difficult to achieve as you expected. I think the current proposal with `RemainingLogsToRecover` and `RemainingSegmentsToRecover` should already provide enough info for the log recovery progress.

...