Status
Current state: Draft
Discussion thread: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
As per the Tiered Storage feature introduced in KIP-405, we added several metrics related to reads(from) and writes(to) for remote storage. The naming convention that was followed is confusing to the users.
For eg. in regular Kafka, BytesIn means bytes written to the log, and BytesOut means bytes read from the log. But with tiered storage, the concepts are reversed.
- RemoteBytesIn means "Number of bytes read from remote storage per second"
- RemoteBytesOut means "Number of bytes written to remote storage per second"
We should rename the tiered storage related metrics to remove any ambiguity.
Also, we should add metrics for the RemoteIndexCache which is a cache of remote index files. The cache helps in avoiding re-fetching the index files like offset, and time indexes from the remote storage for every fetch call. Adding these metrics will improve the observability required for debugging issues.
Public Interfaces
The following metrics will be renamed:
Original Name | Description | New Name |
kafka.server:type=BrokerTopicMetrics, name=RemoteBytesInPerSec, topic=([-.w]+) | The number of bytes read from remote storage per second. | kafka.server:type=BrokerTopicMetrics, name=RemoteFetchBytesPerSec, topic=([-.w]+) |
kafka.server:type=BrokerTopicMetrics, name=RemoteReadRequestsPerSec, topic=([-.w]+) | The number of remote storage read requests per second. | kafka.server:type=BrokerTopicMetrics, name=RemoteFetchRequestsPerSec, topic=([-.w]+) |
kafka.server:type=BrokerTopicMetrics, name=RemoteReadErrorPerSec, topic=([-.w]+) | The number of remote storage read errors per second. | kafka.server:type=BrokerTopicMetrics, name=RemoteFetchErrorsPerSec, topic=([-.w]+) |
kafka.server:type=BrokerTopicMetrics, name=RemoteBytesOutPerSec, topic=([-.w]+) | The number of bytes copied to remote storage per second. | kafka.server:type=BrokerTopicMetrics, name=RemoteCopyBytesPerSec, topic=([-.w]+) |
kafka.server:type=BrokerTopicMetrics, name=RemoteWriteRequestsPerSec, topic=([-.w]+) | The number of remote storage write requests per second | kafka.server:type=BrokerTopicMetrics, name=RemoteCopyRequestsPerSec, topic=([-.w]+) |
kafka.server:type=BrokerTopicMetrics, name=RemoteWriteErrorPerSec, topic=([-.w]+) | The number of remote storage write errors per second. | kafka.server:type=BrokerTopicMetrics, name=RemoteCopyErrorsPerSec, topic=([-.w]+) |
The following metrics will be added to expose cache stats for the Remote Index Cache.
Name | Description |
org.apache.kafka.storage.internals.log:type=RemoteIndexCache, name=HitCount | The number of cache hits for the remote index cache. |
org.apache.kafka.storage.internals.log:type=RemoteIndexCache, name=MissCount | The number of cache misses for the remote index cache. |
org.apache.kafka.storage.internals.log:type=RemoteIndexCache, name=EvictionCount | The number of entries evicted from the remote index cache. |
org.apache.kafka.storage.internals.log:type=RemoteIndexCache, name=LoadSuccessCount | The number of successful cache loads for the remote index cache. |
org.apache.kafka.storage.internals.log:type=RemoteIndexCache, name=LoadFailureCount | The number of failed cache loads for the remote index cache. |
org.apache.kafka.storage.internals.log:type=RemoteIndexCache, name=TotalLoadTime | Total load time (success and failure) for the remote index cache. |
org.apache.kafka.storage.internals.log:type=RemoteIndexCache, name=EvictionWeight | The sum of weights of entries evicted from the remote index cache. |
Compatibility, Deprecation, and Migration Plan
The metrics that are being renamed were only added as part of the same release. Hence, we do not need to maintain backward compatibility for the renamed metrics.
Test Plan
Unit Tests for all the introduced metrics in this KIP.
Rejected Alternatives
None