...
Tests run against kafka commit 6bd73026.
Workload and graphs generated by scripts available here.
Goal
- Understand the performance curve for different values of
max.in.flight.requests.per.connection.
We expect better throughput and latency for higher values of this variable. But when do the benefits tail off?
- If we want to support max.inflight > 1 when enabling idempotence, should we pick a single value and not allow further configuration? If so, what should this value be?
Understand the effect of acks=all when compared to acks=all1. If it is slower why? Can we make acks=all the default?
...
p95 Latency
acks=1 | acks=all |
---|---|
Throughput
acks=1 | acks=all |
---|---|
Observations
- Throughput and latency show big improvements from max.inflight=1 to max.inflight=2, but the performance plateaus thereafter.
- No major difference in Slight throughput degradation between acks=1 and acks=all..
- There is a major 2x degradation in p95 latency between acks=1 and acks=all except for 64 byte messages.
- Plots above are for 9 partitions. If you keep increasing the number of partitions, the difference between acks=1 and acks=all and max.inflight=1 and max.inflight=2 becomes smaller and smaller.
- This not surprising as as the number of partitions increases, the payload of each
ProduceRequest
is bigger, hence the relative overhead of additional operations per request is smaller.
- This not surprising as as the number of partitions increases, the payload of each
...
For the run above, the p50 latency for acks=1 and acks=all is totally unintuitive.. it is actually better for acks=all, and also is worse for max.inflight=4 when compared to max.inflight=3
acks=1 | acks=all |
---|---|
At this time, there is nothing to explain the performance behavior of acks=all and acks=1:
...
- We should optimize the producer for max.inflight=2. The data suggests that there is really no benefit to any other value. This suggests deprecating this config, especially when there is low latency between the client and the broker.
- We don't understand the behavior of acks=all and acks=1 across different workloads and across the entire latency spectrum. We should leave the default as is.
...