...
Another idea considered was to fetch new leader on the client using the usual Metadata RPC call, once produce or fetch request fails with NOT_LEADER_OR_FOLLOWER or FENCED_LEADER_EPOCH. But And save time on the client by avoiding the static retry delay(RETRY_BACKOFF_MS_CONFIG) on a failed request, instead retry immediately as soon as a new leader is available for the partition on the client. Consider the total time taken for a produce-path, when leader changes -
- Total Time for alternative = Produce RPC(client to old leader) + Time taken to refresh metadata to get new eader + Produce RPC(client to new leader)
- Total Time for the favored proposed changes = Produce RPC(client to old leader, response ProduceResponse has new leader) + Produce RPC(client to new leader)
...