Table of Contents |
---|
Kafka runs on JVM, but no JVM exporter in kafka-ecosystems. i wrote one on Spring boot for working and very happy to share.
Status
Current state: [One of "Under Discussion", "Accepted", "Rejected"]
Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]
JIRA: here [Change the link from KAFKA-1 to your own ticket]
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Describe the problems you are trying to solve.
Public Interfaces
Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.
A public interface is any change to the following:
Binary log format
The network protocol and api behavior
Any class in the public packages under clientsConfiguration, especially client configuration
org/apache/kafka/common/serialization
org/apache/kafka/common
org/apache/kafka/common/errors
org/apache/kafka/clients/producer
org/apache/kafka/clients/consumer (eventually, once stable)
Monitoring
Command line tools and arguments
- Anything else that will likely break existing users in some way when they upgrade
Proposed Changes
kafka is an excellent MQ/Data Pipeline running on JVM, but no exporters JVMly. for a better future of Kafka-Ecosystems
the Apache needs a formal exporter like https://github.com/apache/rocketmq-exporter.
i wrote one for working, and hope to give to Apache. there are a lot of metric in JMX, it can be configed in the exporter-config.
Public Interfaces
How to config an Exporter?
common config
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
server:
port: 5650
spring:
application:
name: kafka-exporter
profiles:
active: dev
http:
encoding:
charset: UTF-8
enabled: true
force: true
logging:
config: classpath:logback.xml
task:
count: 8
brokerTopicMetrics:
cron: 1/15 * * * * ?
lagMetrics:
cron: 1/15 * * * * ?
jvmMetrics:
cron: 1/15 * * * * ?
replicaMetrics:
cron: 1/15 * * * 12 ?
networkMetrics:
cron: 1/15 * * * * ?
logFlushMetrics:
cron: 1/15 * * * * ?
kafkaControllerMetrics:
cron: 1/15 * * * 12 ?
kafkaClusterMetrics:
cron: 1/15 * * * 12 ?
kafka-exporter:
kafka-versions.0.10.2.0: 1 ## different kafka versions using different api versions
kafka-versions.0.10.1.1: 1 ## different kafka versions using different api versions
kafka-versions.1.0.0: 1
canSendToPaladin: true
## allowCollectMetrics and forbidCollectMetricNames for this yml's task config
allowCollectMetrics.brokerTopicMetrics:
- kafka.server:type=BrokerTopicMetrics,name=*
- kafka.server:type=BrokerTopicMetrics,name=*,topic=*
forbidCollectMetricNames.brokerTopicMetrics:
- FetchMessageConversionsPerSec
allowCollectMetrics.jvmMetrics:
- java.lang:type=GarbageCollector,name=*
- java.lang:type=Threading
forbidCollectMetricNames.jvmMetrics:
- Code Cache
allowCollectMetrics.replicaMetrics:
- kafka.server:type=ReplicaManager,name=*
forbidCollectMetricNames.replicaMetrics:
- aa
allowCollectMetrics.networkMetrics:
- kafka.network:type=RequestMetrics,name=*,request=*
- kafka.network:type=RequestMetrics,name=*,request=*,version=* # for 2.0.0
- kafka.network:type=SocketServer,name=*
- kafka.network:type=RequestChannel,name=*
- kafka.server:type=KafkaRequestHandlerPool,name=*
forbidCollectMetricNames.networkMetrics:
- MessageConversionsTimeMs # normally , use metric name
- TemporaryMemoryBytes
- MessageConversionsTimeMs
- ThrottleTimeMs
- TotalTimeMs
- LocalTimeMs
- RemoteTimeMs
- RequestBytes
- ResponseQueueTimeMs
- ResponseSendTimeMs
forbidCollectMetricNames.RequestMetrics:
- AlterConfigs
- AlterReplicaLogDirs
- ApiVersions
- ControlledShutdown
- CreateAcls
- CreateDelegationToken
- DeleteAcls
- DeleteRecords
- DescribeAcls
- DescribeConfigs
- DescribeDelegationToken
- DescribeLogDirs
- EndTxn
- ExpireDelegationToken
- InitProducerId
- OffsetForLeaderEpoch
- RenewDelegationToken
- SaslAuthenticate
- SaslHandshake
- StopReplica
- TxnOffsetCommit
- WriteTxnMarkers
- AddOffsetsToTxn
allowCollectMetrics.logFlushMetrics:
- kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
- kafka.log:type=LogCleanerManager,name=*
forbidCollectMetricNames.logFlushMetrics:
- aa
allowCollectMetrics.kafkaControllerMetrics:
- kafka.controller:type=KafkaController,name=*
forbidCollectMetricNames.kafkaControllerMetrics:
- aa
allowCollectMetrics.kafkaClusterMetrics:
- kafka.cluster:type=Partition,name=*,topic=*,partition=*
forbidCollectMetricNames.kafkaClusterMetrics:
- aa
jmx-excludes-metrics.brokerTopicMetrics:
- aa
jmx-excludes-attrs.BrokerTopicMetrics:
- aa
jmx-excludes-attrs-global:
- EventType
- RateUnit
- LatencyUnit
- 50thPercentile
- 75thPercentile
- 98thPercentile
- LastGcInfo
- MemoryPoolNames
- ObjectName
- Valid
- Name
- ThreadAllocatedMemoryEnabled
- ThreadAllocatedMemorySupported
- ThreadContentionMonitoringEnabled
- AllThreadIds
- ThreadCpuTimeSupported
- ThreadCpuTimeEnabled
- ThreadContentionMonitoringSupported
- CurrentThreadCpuTimeSupported
- ObjectMonitorUsageSupported
- SynchronizerUsageSupported
|
kafka clusters' you want to monitor
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
kafka-exporter:
zookeepers:
- cluster-name: cluster-name-of-your-kafka-brokers ##cluster name
zk-ip-and-port: 127.0.0.1:2181,127.0.0.2:2181 ## zookeeper addresses
zk-kafka-path: /kafka ## zookeeper namespace
excludes-topics.BrokerTopicMetrics:
- aaa
- bbb
- beexiao(.*?)
jmx-excludes-metrics.BrokerTopicMetrics:
- aa
- bb
jmx-excludes-metrics.RequestMetrics:
- AlterConfigs
- AlterReplicaLogDirs
- ApiVersions
- ControlledShutdown
- CreateAcls
- CreateDelegationToken
- DeleteAcls
- DeleteRecords
- DescribeAcls
- DescribeConfigs
- DescribeDelegationToken
- DescribeLogDirs
- EndTxn
- ExpireDelegationToken
- InitProducerId
- OffsetForLeaderEpoch
- RenewDelegationToken
- SaslAuthenticate
- SaslHandshake
- StopReplica
- TxnOffsetCommit
- WriteTxnMarkers
- AddOffsetsToTxn
jmx-excludes-attrs.BrokerTopicMetrics:
- EventType
- RateUnit
jmx-excludes-attrs.GarbageCollector:
- LastGcInfo
- MemoryPoolNames
- ObjectName
- Valid
- Name
jmx-excludes-attrs.ReplicaManager:
- EventType
- RateUnit
jmx-excludes-attrs.RequestMetrics:
- EventType
- RateUnit
- FifteenMinuteRate
- FiveMinuteRate
- 75thPercentile
- 98thPercentile
jmx-excludes-attrs.LogFlushRateAndTimeMs:
- LatencyUnit
- RateUnit
- EventType
- FifteenMinuteRate
- 50thPercentile
- 75thPercentile
- 98thPercentile |
metric names for now
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
kafka_BrokerTopicMetrics_BytesInPerSec_Count
kafka_BrokerTopicMetrics_BytesInPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_BytesInPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_BytesInPerSec_MeanRate
kafka_BrokerTopicMetrics_BytesInPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_BytesOutPerSec_Count
kafka_BrokerTopicMetrics_BytesOutPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_BytesOutPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_BytesOutPerSec_MeanRate
kafka_BrokerTopicMetrics_BytesOutPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_BytesRejectedPerSec_Count
kafka_BrokerTopicMetrics_BytesRejectedPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_BytesRejectedPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_BytesRejectedPerSec_MeanRate
kafka_BrokerTopicMetrics_BytesRejectedPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_Count
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_Count
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_MessagesInPerSec_Count
kafka_BrokerTopicMetrics_MessagesInPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_MessagesInPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_MessagesInPerSec_MeanRate
kafka_BrokerTopicMetrics_MessagesInPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_Count
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_MeanRate
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_Count
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_MeanRate
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_Count
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_MeanRate
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_Count
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_Count
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_OneMinuteRate
kafka_GarbageCollector_G1_Old_Generation_CollectionCount
kafka_GarbageCollector_G1_Old_Generation_CollectionTime
kafka_GarbageCollector_G1_Young_Generation_CollectionCount
kafka_GarbageCollector_G1_Young_Generation_CollectionTime
kafka_KafkaController_ActiveControllerCount_Value
kafka_KafkaController_ControllerState_Value
kafka_KafkaController_GlobalPartitionCount_Value
kafka_KafkaController_GlobalTopicCount_Value
kafka_KafkaController_OfflinePartitionsCount_Value
kafka_KafkaController_PreferredReplicaImbalanceCount_Value
kafka_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_Count
kafka_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_FifteenMinuteRate
kafka_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_FiveMinuteRate
kafka_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_MeanRate
kafka_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_OneMinuteRate
kafka_LogCleanerManager_max_dirty_percent_Value
kafka_LogCleanerManager_time_since_last_run_ms_Value
kafka_LogFlushStats_LogFlushRateAndTimeMs_95thPercentile
kafka_LogFlushStats_LogFlushRateAndTimeMs_999thPercentile
kafka_LogFlushStats_LogFlushRateAndTimeMs_99thPercentile
kafka_LogFlushStats_LogFlushRateAndTimeMs_Count
kafka_LogFlushStats_LogFlushRateAndTimeMs_FifteenMinuteRate
kafka_LogFlushStats_LogFlushRateAndTimeMs_FiveMinuteRate
kafka_LogFlushStats_LogFlushRateAndTimeMs_Max
kafka_LogFlushStats_LogFlushRateAndTimeMs_Mean
kafka_LogFlushStats_LogFlushRateAndTimeMs_MeanRate
kafka_LogFlushStats_LogFlushRateAndTimeMs_Min
kafka_LogFlushStats_LogFlushRateAndTimeMs_OneMinuteRate
kafka_LogFlushStats_LogFlushRateAndTimeMs_StdDev
kafka_Partition_InSyncReplicasCount_Value
kafka_Partition_LastStableOffsetLag_Value
kafka_Partition_ReplicasCount_Value
kafka_Partition_UnderMinIsr_Value
kafka_Partition_UnderReplicated_Value
kafka_ReplicaManager_FailedIsrUpdatesPerSec_Count
kafka_ReplicaManager_FailedIsrUpdatesPerSec_FifteenMinuteRate
kafka_ReplicaManager_FailedIsrUpdatesPerSec_FiveMinuteRate
kafka_ReplicaManager_FailedIsrUpdatesPerSec_MeanRate
kafka_ReplicaManager_FailedIsrUpdatesPerSec_OneMinuteRate
kafka_ReplicaManager_IsrExpandsPerSec_Count
kafka_ReplicaManager_IsrExpandsPerSec_FifteenMinuteRate
kafka_ReplicaManager_IsrExpandsPerSec_FiveMinuteRate
kafka_ReplicaManager_IsrExpandsPerSec_MeanRate
kafka_ReplicaManager_IsrExpandsPerSec_OneMinuteRate
kafka_ReplicaManager_IsrShrinksPerSec_Count
kafka_ReplicaManager_IsrShrinksPerSec_FifteenMinuteRate
kafka_ReplicaManager_IsrShrinksPerSec_FiveMinuteRate
kafka_ReplicaManager_IsrShrinksPerSec_MeanRate
kafka_ReplicaManager_IsrShrinksPerSec_OneMinuteRate
kafka_ReplicaManager_LeaderCount_Value
kafka_ReplicaManager_OfflineReplicaCount_Value
kafka_ReplicaManager_PartitionCount_Value
kafka_ReplicaManager_UnderMinIsrPartitionCount_Value
kafka_ReplicaManager_UnderReplicatedPartitions_Value
kafka_RequestChannel_RequestQueueSize_Value
kafka_RequestChannel_ResponseQueueSize_Value
kafka_RequestMetrics_RequestQueueTimeMs_95thPercentile
kafka_RequestMetrics_RequestQueueTimeMs_999thPercentile
kafka_RequestMetrics_RequestQueueTimeMs_99thPercentile
kafka_RequestMetrics_RequestQueueTimeMs_Count
kafka_RequestMetrics_RequestQueueTimeMs_Max
kafka_RequestMetrics_RequestQueueTimeMs_Mean
kafka_RequestMetrics_RequestQueueTimeMs_Min
kafka_RequestMetrics_RequestQueueTimeMs_StdDev
kafka_RequestMetrics_RequestsPerSec_Count
kafka_RequestMetrics_RequestsPerSec_FifteenMinuteRate
kafka_RequestMetrics_RequestsPerSec_FiveMinuteRate
kafka_RequestMetrics_RequestsPerSec_MeanRate
kafka_RequestMetrics_RequestsPerSec_OneMinuteRate
kafka_SocketServer_MemoryPoolAvailable_Value
kafka_SocketServer_MemoryPoolUsed_Value
kafka_SocketServer_NetworkProcessorAvgIdlePercent_Value
kafka_Threading_CurrentThreadCpuTime
kafka_Threading_CurrentThreadUserTime
kafka_Threading_DaemonThreadCount
kafka_Threading_PeakThreadCount
kafka_Threading_ThreadCount
kafka_Threading_TotalStartedThreadCount
kafka_consumer_lag
kafka_topic_partitions |
Proposed Changes
build a whole new kafka-exporter RUN-ON-JVM for kafkaDescribe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?Answer: we can monitor our kafka clusters more easily using prometheus exporter in java, and many many metrics you want.
- If we are changing behavior how will we phase out the older behavior?Answer: prometheus is a very good monitor for midwares like kafka, or maybe your ops has already use it.
- If we need special migration tools, describe them here.Answer: some prometheus servers, and prometheus alert manager
- When will we remove the existing behavior?Answer: all exporters run stable and you can view all metrics in some UI(like grafana)
Rejected Alternatives
let's do this!
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.