Table of Contents |
---|
Status
Current state: Voting (voting thread)
...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
KIP-664: Provide tooling to detect and abort hanging transactions provided tooling to get visibility into transactional and idempotent producers that the broker keeps track of. This KIP proposes to add ProducerCount
ProducerIdCount
metrics that enable easy monitoring of transactional and idempotent producer counts on the broker.
Producer ids are used by idempotent and transaction producers. The brokers keep a small amount of metadata (e.g. producer id, epoch, sequence number, etc.) in memory for every partition that the idempotent producer produced to. This metadata is maintained on every replica and it's recovered from logs and snapshots even if brokers restart. The KIP-98 - Exactly Once Delivery and Transactional Messaging has details on producer ids and related protocols and data structures.
In idempotent producers, a new producer id is created when KafkaProducer is created. A badly written application may frequently create new KafkaProducer objects. This is not optimal in general, but specifically for idempotent producers, doing so would pollute broker memory with producer ids and related metadata. Even though the metadata for each producer id is small, creating too many producer ids could run brokers out of memory.
The ProducerIdCount
metric can be used to set up alerts so that this pattern can proactively detected and action could be taken before too many producer ids run the broker out of memory.
Public Interfaces
We propose adding 2 a new broker metricsmetric
Name | Description | kafka.cluster:type=Partition,name=ProducerIdCount,topic=<topic>,partition=<N> | The number of active transactional / idempotent producers that produced to topic <topic> partition <N>. |
---|---|---|---|
kafka.server:type=ReplicaManager,name=ProducerIdCount | The total number of active transactional / idempotent producers in the broker. |
Proposed Changes
Add the new metrics metric to the ReplicaManager
and Partition
classes correspondinglyclass.
Compatibility, Deprecation, and Migration Plan
- No migration plan is needed because these metrics are the metric is new
Rejected Alternatives
Have partition level metric as well - this doesn't seem to be needed as we can use KIP-664: Provide tooling to detect and abort hanging transactions for detailed debugging, once alerted on total producer id on the broker.
Name the metric ProducerCount
- may be misleading as the producers without producer ids are not counted.
...