This page is a summary of results on the analysis I did on understanding the optimal value of max.inflight.requests.per.connection
as well as the performance impact of acks=all
.
Test Setup
- 3 brokers on AWS, d2.xlarge instances: 3x2TB locally attached disks. 32GB RAM, 4 Xeon cores
- 1 client machine in same availability zone.
- Each performance run produced 10GB of data.
Tests run against kafka commit 6bd73026.
Goal
Summary of results
p95 Latency
Throughput
Observations
- Throughput and latency show big improvements from max.inflight=1 to max.inflight=2, but the performance plateaus thereafter.
- Slight throughput degradation between acks=1 and acks=all.
- There is a major 2x degradation in p95 latency between acks=1 and acks=all except for 64 byte messages.
- Plots above are for 9 partitions. If you keep increasing the number of partitions, the difference between acks=1 and acks=all and max.inflight=1 and max.inflight=2 becomes smaller and smaller.
- This not surprising as as the number of partitions increases, the payload of each
ProduceRequest
is bigger, hence the relative overhead of additional operations per request is smaller.
More on acks=1 and acks=all
For the run above, the p50 latency for acks=1 and acks=all is totally unintuitive.. it is actually better for acks=all, and also is worse for max.inflight=4 when compared to max.inflight=3
At this time, there is nothing to explain the performance behavior of acks=all and acks=1:
- Broker metrics for both runs are similar (NetworkProcessorAvgIdlePercent, RequestHandlerIdlePercent, TotalProduceTime, etc.)
- GC logs are similar in terms of object allocations and the number of collections per second and the pause times.
Conclusion
From these tests, we can conclude the following:
- We should optimize the producer for max.inflight=2. The data suggests that there is really no benefit to any other value, especially when there is low latency between the client and the broker.
- We don't understand the behavior of acks=all and acks=1 across different workloads and across the entire latency spectrum. We should leave the default as is.