Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.

Status

Current state: "Under Discussion" Accepted

Discussion thread: here

JIRA: KAFKA-5746

...

ApiKeyScope of errorRequest:Errors Mapping
UpdateMetadatarequest1:1
ControlledShutdownrequest1:1
FindCoordinatorrequest1:1
JoinGrouprequest1:1
Heartbeatrequest1:1
LeaveGrouprequest1:1
SyncGrouprequest1:1
ListGroupsrequest1:1
SaslHandshakerequest1:1
ApiVersionsrequest1:1
InitProducerIdrequest1:1
AddOffsetsToTxnrequest1:1
EndTxnrequest1:1
DescribeAclsrequest1:1
Producepartition1:n
Fetchpartition1:n
Offsetspartition1:n
OffsetCommitpartition1:n
OffsetFetchpartition1:n
DeleteRecordspartition1:n
OffsetForLeaderEpochpartition1:n
AddPartitionsToTxnpartition1:n
WriteTxnMarkerspartition1:n
TxnOffsetCommitpartition1:n
LeaderAndIsrpartition + request1:n
StopReplicapartition + request1:n
Metadatatopic1:n
CreateTopicstopic1:n
DeleteTopicstopic1:n
DescribeGroupsgroup1:n
CreateAclsacl1:n
DeleteAclsacl1:n
DescribeConfigsresource1:n
AlterConfigsresource1:n

 

...

Message conversion rate and time

Down conversions are expensive since the whole response has to be read into memory for conversion. It will be useful to monitor the rate of down conversion and the time spent on conversions.

Fetch down and produce message conversion rate rates will be a meter meters in the same group as existing topic metrics TotalFetchRequestsPerSec etc.

MBean: kafka.server:type=BrokerTopicMetrics,name=FetchMessageConversionsPerSec,topic=([-.\w]+)

MBean: kafka.server:type=BrokerTopicMetrics,name=FetchDownConversionsPerSecProduceMessageConversionsPerSec,topic=([-.\w]+)

It will also be useful to know the time taken for down conversions. Fetch down conversion time metric will be a histogram in the same groupalongside other request time metrics. This time will also be included in request logs so that clients requiring expensive down conversions can be identified. Conversion time will also be added for produce requests.

MBean: kafka.servernetwork:type=BrokerTopicMetricsRequestMetrics,name=FetchDownConversionsTimeMsMessageConversionsTimeMs,topic=([-.\w]+)

...

request={Produce|Fetch}

Request size and temporary memory size

Large messages can cause GC issues in the broker, especially if down conversions are required. Maximum message batch size can be configured per topic to control this, but that is the size after compression. Since the batches are decompressed to validate produce requests and for fetch downconversiondown conversion, it will be useful to have topic metrics for produce message batch size.

Topic Request metrics will be added for request size as well as the temporary memory size for processing the request. These two metrics will be histograms. For produce messages the two metrics will indicate message batch size before and after decompression as well as the compression ratio. The two sizes will give an indication of compression ratio as well. These two metrics will be histograms.

 

values will also be included in request logs so that clients requiring a lot of temporary memory space can be identified.

MBean: kafka.servernetwork:type=BrokerTopicMetricsRequestMetrics,name=ProduceBatchSizeRequestBytes,topic=([-.\w]+)request=<apiKey>

MBean: kafka.servernetwork:type=BrokerTopicMetricsRequestMetrics,name=ProduceUncompressedBatchSizeTemporaryMemoryBytes,topic=([-.\w]+)request=<apiKey>

Authentication success and failure rates

...

  1. successful-authentication-rate
  2. failed-authentication-rate

ZooKeeper status and latency

It will be good to monitor latency of ZooKeeper requests so that any issues with ZooKeeper communication can be detected early.

...

MBean: kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperLatencyZooKeeperRequestLatencyMs

It will also be useful to see the current status of broker's connection to ZooKeeper.

This will be a String Gauge in the existing group SessionExpireListener which currently shows the rate of each state (eg. DisconnectsPerSec)

MBean: kafka.server:type=SessionExpireListener,name=SessionState

State will be one of Disconnected|SyncConnected|AuthFailed|ConnectedReadOnly|SaslAuthenticated|Expired

Client-side metrics

Client versions

...