Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Fixed typo for the property and a few other minor corrections.

...

With the change to `zookeeper.session.timeout.ms`, we will make a similar adjustment `replica.lag.time.max.ms`, which controls how aggressively the leader removes lagging replicas from the ISR. A longer timeout gives replicas more time to catch up before getting kicked from the ISR. The reason to use a value which is larger is to avoid a race condition with the controller in the case of a failure. We want the controller to detect the failure first since it is more efficient for it to remove the node from all ISRs at once.

A quick discussion on replica lag: Intuitively, it might seem reasonable to be more aggressive with ISR eviction. That is, we might consider letting the lag time be smaller   than the  the session timeout. The sooner a replica is removed from the ISR, the sooner the partition may be able to accept writes again. However, there are two reasons why this is not so simple. First, when a replica is failing to keep up, it is often not clear whether the problem is on the leader or the follower. It might just be that the leader is failing to work through a backlog of requests quickly enough. We have seen this many times. In this case, shrinking the ISR actually makes recovery more difficult because we are removing a potential leader from the ISR. Secondly, if a follower is genuinely not keeping up, then removing it from the ISR means that the broker gives up its ability to exert back-pressure on the clients through the advancement of the high watermark. If this is a persistent condition, then the lagging broker will fall further and further behind. For these reasons, we think it is smarter to be conservative about shrinking the ISR.

...

The new defaults are more conservative, so we think the impact will be low. Users who have set a custom values will not be affected.

...