Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Another idea considered was to fetch new leader on the client , using the usual Metadata RPC call. And , once produce or fetch request fails with NOT_LEADER_OR_FOLLOWER or FENCED_LEADER_EPOCH. But save time on the Produce path client by avoiding the retry delay and on a failed request, instead retry immediately . This was rejected, as single metadata call can be slow, and there can be metadata propagation delays. So immediate retry on the Produce path won't always be fruitfulas soon as a new leader is available for the partition on the client. Consider the total time taken for a produce-path, when leader changes -

  1. Total Time for alternative = Produce RPC(client to old leader) + Time taken to refresh metadata to get new eader + Produce RPC(client to new leader)
  2. Total Time for the favored proposed changes = Produce RPC(client to old leader, response has new leader) +  Produce RPC(client to new leader)

It can be clearly seen alternative has an extra-component, i.e.  time taken to refresh metadata to get new eader. This time has a lower bound of 1 single Metadata RPC call, but degrades to many such calls if metadata propagation is slower through the cluster. Due to this, proposed changes, is the preferred alternative.