Table of Contents |
---|
Status
Current state: Under DiscussionDiscarded. Withdrawn because the solution we agreed on does not require interface change.
Discussion thread:
JIRA:
Motivation
Currently broker will sanity check index files of all log segments in the log directory after the broker is started and before it becomes leader or follower for any partition. Sanity check a index file can be slow because broker needs to open index file and read data into page cache. Even if we apply time-based retention policy, the number of inactive log segments will still increase as the usage (e.g. number of partition and total-bytes-in-rate) of the Kafka cluster increases. As a result, the time to restart a broker will increase proportional to the number of segments in the log directory.
...
Ideally we would like broker startup time not to be proportional to the number of inactive segments. Most inactive segments will not be consumed, and even if it is consumed, its index files most likely will not require recovery after clean broker shutdown. This KIP proposes a way to lazy load log segments in order to reduce the cluster rolling bounce time.
Public Interfaces
...
Add broker config sanity.check.all.logs.enabled. This config value will be true by default.
If it is set to true, sanity check all log segments on broker startup. Otherwise, sanity check only the active log segments on broker startup. Inactive segments will be sanity checked after broker becomes leader for those partitions, either by the background threads, or by the request handler thread if the log segment has not been sanity checked when the request handler thread tries to access the log segment.
Proposed Change
This KIP proposes the following changes:
...
- If there is log corruption in any segment, all segments prior to this segment will be sanity checked as well before any other thread can access log segments or index files of this partition.
Here are the caveats in delaying the sanity check which we should be aware of:
...
Solution: At LinkedIn we currently require client to tolerate at least 120 seconds of unavailability (with 20 retries and 10 seconds retry backoff) which will happen during leadership transfer. This should be sufficient for sanity check if there is no log corruption. Log corruption after clean broker shutdown is very rare. If there is log corruption for many log segments after clean shutdown, most likely there is hardware issue and it will likely affect the active segment as well. If there is log corruption in the active segment, we will sanity check all segments of this partition and therefore broker degrates to the existing behavior, which should avoid the concern that otherwise can happen if we only sanity check after broker becomes leader. So the probability of this becoming an issue should be very raresmall.
Evaluation
...
We run test in an experiement cluster of 15 brokers on 15 machines. There 31.8k partitions with RF=3 which are evenly distributed across brokers. Each broker has roughly 3000 log segments. Total bytes-in-rate is around 400 MBps.
Here is the evaluation result:
- On a given broker in the test cluster, LogManger startup time reduced from 311 sec to 15 sec.
- When doing rolling bounce in the test cluster, rolling bounce time reduces from 135 minutes to 55 minutes.
- When there is no log corruption, the maximum time to sanity check a partition across all partitions in the test cluster is 59 seconds. If all index and timeindex files of this partition are deleted, the time to recover this partition is 265 seconds.
...
This KIP has no compatibility concern.
Rejected Alternatives
...
Future work
...