...
ApiKey | Scope of error | Request:Errors Mapping |
---|---|---|
UpdateMetadata | request | 1:1 |
ControlledShutdown | request | 1:1 |
FindCoordinator | request | 1:1 |
JoinGroup | request | 1:1 |
Heartbeat | request | 1:1 |
LeaveGroup | request | 1:1 |
SyncGroup | request | 1:1 |
ListGroups | request | 1:1 |
SaslHandshake | request | 1:1 |
ApiVersions | request | 1:1 |
InitProducerId | request | 1:1 |
AddOffsetsToTxn | request | 1:1 |
EndTxn | request | 1:1 |
DescribeAcls | request | 1:1 |
Produce | partition | 1:n |
Fetch | partition | 1:n |
Offsets | partition | 1:n |
OffsetCommit | partition | 1:n |
OffsetFetch | partition | 1:n |
DeleteRecords | partition | 1:n |
OffsetForLeaderEpoch | partition | 1:n |
AddPartitionsToTxn | partition | 1:n |
WriteTxnMarkers | partition | 1:n |
TxnOffsetCommit | partition | 1:n |
LeaderAndIsr | partition + request | 1:n |
StopReplica | partition + request | 1:n |
Metadata | topic | 1:n |
CreateTopics | topic | 1:n |
DeleteTopics | topic | 1:n |
DescribeGroups | group | 1:n |
CreateAcls | acl | 1:n |
DeleteAcls | acl | 1:n |
DescribeConfigs | resource | 1:n |
AlterConfigs | resource | 1:n |
Fetch down conversion rate and time
Down conversions are expensive since the whole response has to be read into memory for conversion. It will be useful to monitor the rate of down conversion and the time spent on conversions.
This Fetch down conversion rate will be a meter in the same group as existing topic metrics TotalFetchRequestsPerSec
etc.
MBean: kafka.server:type=BrokerTopicMetrics,name=FetchDownConversionsPerSec,topic=([-.\w]+)
It will also be useful to know the time taken for down conversions. Fetch down conversion time will be a histogram in the same group.
MBean: kafka.server:type=BrokerTopicMetrics,name=FetchDownConversionsTimeMs,topic=([-.\w]+)
Message batch size
Large messages can cause GC issues in the broker, especially if down conversions are required. Maximum message batch size can be configured per topic to control this, but that is the size after compression. Since the batches are decompressed to validate produce requests and for fetch downconversion, it will be useful to have topic metrics for produce message batch size.
Topic metrics will be added for produce message batch size before and after decompression. The two sizes will give an indication of compression ratio as well. These two metrics will be histograms.
MBean: kafka.server:type=BrokerTopicMetrics,name=ProduceBatchSize,topic=([-.\w]+)
MBean: kafka.server:type=BrokerTopicMetrics,name=ProduceUncompressedBatchSize,topic=([-.\w]+)
Authentication success and failure rates
Rate of failed authentications are useful to identify misconfigured or malicious connection attempts. Successful connection rates may also be helpful for each listener.
These metrics will be Kafka metrics added to the same group as network metrics like connection-creation-rate
.
MBean: kafka.server:type=socket-server-metrics,listener=<listenerName>,networkProcessor=<processorIndex>
New attributes:
- successful-authentication-rate
- failed-authentication-rate
ZooKeeper latency
It will be good to monitor latency of ZooKeeper requests so that any issues with ZooKeeper communication can be detected early.
...