Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • None, since we are using the same throttling from ClientQuota which the client knows how to handle.

Rejected Alternatives

  1. Limit the total active producer ID allocation number: This solution is the simplest however as stated in the motivation the OOM is always caused by rough or misconfigured client this solution will punish good client along side the rough one. 
  2. Having a limit to the number of active producer IDs: The idea here is if we had misconfigured client, we will expire the older entries This solution will risk the idempotency guarantees. Also there are risk that we my end up expiring the PIDs for good clients as the there is no way to link back PID to specific client at this point. 
  3. allow Allow clients to "close" the producer ID usage: This solution is better however it only improve the situation with new clients leaving the broker exposed to OOM because of old producers. We may need to consider improving the Producer Client to include this but not as part of the scope of this KIP.
  4. Throttle INIT_PRODUCER_ID requests: This solution might look simple however throttling the INIT_PRODUCER_ID doesn't grutnee the OOM would happened as
    1. INIT_PRODUCER_ID for idempotent producer request PIDs from random controller every time so if a client got throttled on one controller doesn't guarantee it will not go through on next controller causing OOM at the leader later
    2. The problem happened on the activation of the PID when it produce and not at the initialisation. So it's more sufficient to throttle at the produce time
  5. Throttle PIDs based on IPs: Similar solution#1 we will end up punishing good users specially if the misbehaving producer is deployed on K8S cluster that has other usecase.
  6. Use HashSet to track PIDs in the caching layer instead of BloomFilter: HashSet provide 100% correctness however the growth of the caching layer with HashSet will be create a risk of OOM. While it is not as bad as the original OOM as the broker wouldn't rebuild this cache on the start time none the less. To control the memory of cache using HashSet will be bit tricky and will need more configuration to keep it under control.
    On the other hand BloomFilter is more efficient when it come to memory cost while providing a reasonable correctness that will be good enough for this usecase.  And if we want to improve the correctness we can always improve the false positive rates in the bloom filter.