...
This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.
Status
Current state: "Under Discussion" Accepted
Discussion thread: here
JIRA: KAFKA-5746
...
ApiKey | Scope of error | Request:Errors Mapping |
---|---|---|
UpdateMetadata | request | 1:1 |
ControlledShutdown | request | 1:1 |
FindCoordinator | request | 1:1 |
JoinGroup | request | 1:1 |
Heartbeat | request | 1:1 |
LeaveGroup | request | 1:1 |
SyncGroup | request | 1:1 |
ListGroups | request | 1:1 |
SaslHandshake | request | 1:1 |
ApiVersions | request | 1:1 |
InitProducerId | request | 1:1 |
AddOffsetsToTxn | request | 1:1 |
EndTxn | request | 1:1 |
DescribeAcls | request | 1:1 |
Produce | partition | 1:n |
Fetch | partition | 1:n |
Offsets | partition | 1:n |
OffsetCommit | partition | 1:n |
OffsetFetch | partition | 1:n |
DeleteRecords | partition | 1:n |
OffsetForLeaderEpoch | partition | 1:n |
AddPartitionsToTxn | partition | 1:n |
WriteTxnMarkers | partition | 1:n |
TxnOffsetCommit | partition | 1:n |
LeaderAndIsr | partition + request | 1:n |
StopReplica | partition + request | 1:n |
Metadata | topic | 1:n |
CreateTopics | topic | 1:n |
DeleteTopics | topic | 1:n |
DescribeGroups | group | 1:n |
CreateAcls | acl | 1:n |
DeleteAcls | acl | 1:n |
DescribeConfigs | resource | 1:n |
AlterConfigs | resource | 1:n |
...
Message conversion rate and time
Down conversions are expensive since the whole response has to be read into memory for conversion. It will be useful to monitor the rate of down conversion and the time spent on conversions.
Fetch down and produce message conversion rate rates will be a meter meters in the same group as existing topic metrics TotalFetchRequestsPerSec
etc.
MBean: kafka.server:type=BrokerTopicMetrics,name=FetchMessageConversionsPerSec,topic=([-.\w]+)
MBean: kafka.server:type=BrokerTopicMetrics,name=FetchDownConversionsPerSecProduceMessageConversionsPerSec,topic=([-.\w]+)
It will also be useful to know the time taken for down conversions. Fetch down conversion time metric will be a histogram alongside other request time metrics. This time will also be included in request logs so that clients requiring expensive down conversions can be identified. Conversion time will also be added for produce requests.
MBean: kafka.network:type=RequestMetrics,name=
FetchDownConversionsTimeMs
MessageConversionsTimeMs
,request={Produce|Fetch}
Request size and temporary memory size
...
MBean: kafka.network:type=RequestMetrics,name=RequestSizeRequestBytes,request=<apiKey>
MBean: kafka.network:type=RequestMetrics,name=TemporaryMemorySizeTemporaryMemoryBytes,request=<apiKey>
Authentication success and failure rates
...
- successful-authentication-rate
- failed-authentication-rate
ZooKeeper status and latency
It will be good to monitor latency of ZooKeeper requests so that any issues with ZooKeeper communication can be detected early.
...
MBean: kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperLatency,name=ZooKeeperRequestLatencyMs
It will also be useful to see the current status of broker's connection to ZooKeeper.
This will be a String Gauge in the existing group SessionExpireListener which currently shows the rate of each state (eg. DisconnectsPerSec)
MBean: kafka.server:type=SessionExpireListener,name=SessionState
State will be one of Disconnected|SyncConnected|AuthFailed|ConnectedReadOnly|SaslAuthenticated|Expired
Client-side metrics
Client versions
...