Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This feature can be enabled by setting the property “follower.fetch.pending.reads.insync.enable” to true. The default value will be set as false to give backward compatibility.

The main disadvantage with this approach is offline partitions occurrence will be reduced but it can still happen when there are requests queued up and the existing fetch requests in io threads are taking longer. The subsequent requests may get stuck in the queue and they may not be able to get served before the ISR Expiration Task considers them out of sync and eventually causes offline partitions. 

Solution 2

This is an extension of Solution 1 but the leader relinquishes the leadership if a fetch request takes longer than expected. This will mitigate the case of requests getting piled up in the requests queue as mentioned earlier. We can introduce the respective config for the timeout with a default value. 

The main disadvantage with this approach is ISR thrashing may occur when PreferredLeaderElection is enabled and the current affected leader is a preferred leader. 

Example : 

<Topic, partition>: <events, 0>

...

This change is backward compatible with previous versions.

Rejected Alternatives

Whenever the leader partition throws errors or is not able to process the requests within a specific time (at least replica.lag.time.max.ms) then it should relinquish its leadership and allow another in-sync replica as the leader. But this may cause ISR thrashing when there are intermittent issues in processing the follower fetch requests.