Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Part of the root cause of the OOM problem is that we keep PIDs metadata in the broker even if the producer is "closed". This solution would provide a closed API (for example END_PRODUCER_ID) and the broker will remove the PID metadata from its side. In the client side, we can send it when the producer closing.  This solution is better however

  • it only improve improves the situation with new clients leaving the broker exposed to OOM because of old producers.
  • It doesn't address producers that enter repeated restart cycle as these producer will be crashing and will not call producer.close  method.

...

    1. INIT_PRODUCER_ID for idempotent producer request PIDs from random controller every time so if a client got throttled on one controller doesn't guarantee it will not go through on next controller causing OOM at the leader later.
    2. The problem happened on the activation of the PID when it produce and not at the initialisation. Which means Kafka wouldn't have OOM problem if the producer got assigned PID but crashed before producing anything.
    3. Throttling producers that crash between initialisation and producer could slow them down when they recover/fix the problem that caused them to crash right after initialising PID. 

...