Page History

...

Latency improvement of workloads run with acks=all

	Baseline latency (ms)	Optimized latency (ms)	Improvement
High Partitions
p99 E2E	188	184	2.1%
p99 Produce	155.65	151.8	2.5%
Low Partitions
p99 E2E	393	374.5	4.7%
p99 Produce	390.95	374.35	4.2%

Latency improvement of workloads run with acks=1

	Baseline latency (ms)	Optimized latency (ms)	Improvement
High Partitions
p99 E2E	106.5	101	5.2%
p99 Produce	84.7	83.3	1.7%
Low Partitions
p99 E2E	12.5	12.5	0%
p99 Produce	3.25	2.95	9.2%

Workload Details

All tests are run on 6 m5.xlarge Apache Kafka brokers running with Kraft as the metadata quorum in 3 m5.xlarge instances. The clients are 6 m5.xlarge instances running the OpenMessagingBenchmark. The test is run for 70 minutes, during which the brokers are restarted one by one with a 10 minute interval between restarts.

...

Another idea considered was to fetch new leader on the client using the usual Metadata RPC call, once produce or fetch request fails with NOT_LEADER_OR_FOLLOWER or FENCED_LEADER_EPOCH. And save time on the client by avoiding the static retry delay(RETRY_BACKOFF_MS_CONFIG) on a failed request, instead retry immediately as soon as a possible when the new leader is available for the partition on the client. Consider the total time taken for a produce-path, when leader changes -

...

Space shortcuts

Child pages

Versions Compared

Old Version 15

New Version 16

Key

Latency improvement of workloads run with acks=all

Latency improvement of workloads run with acks=1

Workload Details