...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Currently, the MessagesInPerSec metrics of Kafka is a topic-level metrics. It is emitted without the partition info, see ReplicaManager. This metric provides vital information regarding topic throughput and is usually leveraged in topic quota/partition management. However, if the traffic of a topic is not balanced, the overall throughput does not provide any insight into this imbalance nature. This is particularly problematic for keyed topics as messages may be produced with hotspots in the spectrum and overload certain partitions only. In this case:
...
Part 1: Verbosity Level Control
metrics.verbosity
will will be a new dynamic config introduced to control the verbosity(fan-out rate) of the metrics. It's a config with JSON format specifying the condition controlling fan-out of the metrics. If the value high
is selected set for the level
part key of the configured JSON(see below for example values), high fan-out tags(e.g. partition
)will be added to metrics specified by the name
filter and will apply to all the topics that meet the conditions in the filters
section. In the low
settings, these tags will be assigned with an empty value. We elected to make it central so that this implementation can be generalized in the future either into a library, or allow other means for centralized control.
...
Type | String(regex expression) |
Default | "[]" |
Valid Values / Examples | A string representing a JSON array, details below. The above config means the |
Importance | medium |
Update Mode | Dynamic Read & Write |
...
Key Name | Default Value | Valid Values | Restrictions | Example |
---|---|---|---|---|
level | low | low, medium, high | JSON entries are validated with no particular order, or they are order-irrelevant. The entries can be duplicated, but are required to have no conflict between each other. If there's a conflict between values configured with different metrics level settings, the one that appears first in the sequence NOTE: It's okay to have multiple entries of | "level": "high" |
names | .* | Pattern corresponding to the names of the metric series in scope | "names": "Bytes *" | |
filters | {} [] (empty JSON objectarray, applies to nothing) |
or
|
|
or
|
All regex patterns mentioned above should be java.util.regex
-compatible, see more details in Java util's regex engine.
Working with the Configuration via CLI
- Querying existing config
- Command:
bin/kafka-configs.sh --bootstrap-server localhost:9092 --describe --entity-type brokers
- Example Result 1(default, if not set previously):
metrics.verbosity=
[]
Expected behavior: verbosity level low is applied to all metric series and no high fan-out tags addedMessagesInPerSec - Example Result 2:
metrics.verbosity=
[{"level": "low", names: ".*"}]
Expected behavior: verbosity level low is applied to all metric series and no high fan-out tags added - Example Result 3:
metrics.verbosity=[{
"level": "high",
Expected behavior:
"name": "Bytes*",
"filters" = [{topics: ["avro-topic-car", "avro-topic-bus", "avro-topic-plane"]}]
}]BytesInPerSec
,BytesOutPerSec
,BytesRejectedPerSec
will include partition-level values for the 3 listed topics.
- Command:
- Setting new config
- Example Command 1(add config):
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --add-config metrics.verbosity=[{"level": "high", names: ".*", filters=[{"topics": ["avro-*"]}]}]
Expected behavior after the change: partition-level values will be added to all the 10 metrics series for all the topics matching name regex: `avro-*
` - Example Command 1(alter config):
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --alter-config metrics.verbosity=[{"level": "low", names: ".*"}]
Expected behavior after the change: partition-level values will be removed for all the metrics for all the topics.
- Example Command 1(add config):
Performance Consideration
...
Metrics Name | Meaning |
| Messages entered the partition, per second |
| Bytes entered the partition, per second |
| Bytes retrieved from the partition, per second |
| Bytes exceeding max message size in a partition, per second |
| Produce request count for a partition, per second |
| Fetch request count for a partition, per second |
| Failed to produce request count for a partition, per second |
| Failed to fetch request count for a partition, per second |
| Broker side conversions(de-compressions) for a partition, per second |
| Broker side conversions(compressions) for a partition, per second |
Compatibility, Deprecation, and Migration Plan
...
- How many topic partitions, if having this feature enabled, would not overload the broker by too much?
- What will the worst performance result if this feature is enabled under the most severe case?
- Most severe casecases: high traffic broker, high partition count, low metrics emission interval, high percentage of topic-partitions with this feature enabled
...
- Topic-partitions: 10% - 100% of topic-partitions of the broker, 10% step
- Metrics: 1-3 of the names listed in the Metrics Names
Rejected Alternatives
Using the Existing RecordingLevel Config
The RecordingLevel
config also controls the verbosity of the metrics emitted by various Kafka components. However, we decided not to build on top of this config for the following reasons:
- This config currently has only one string value and can only control metrics emission at the Sensor level. This does not meet our need for detailed fan-out rate control.
- This config has to be applied to all new configs if we want to use it to control the recording level. This means we first need to create a separate sensor, which introduces unnecessary duplications.
- This config lacks the dynamic adjustment capability demanded by this KIP.
To further reduce the overall duplication and move towards a unified control. With this KIP, we propose the following change to merge/clean up the usage of RecordingLevel
:
- The recording levels can be mapped to the config values defined above. For example,
RecordingLevel.TRACE
can be defined as{"level":"high", "name":"<trace-level-metrics>"}
- The existing metric sensors can be updated to remove the RecordingLevel and accept the control of
metrics.verbosity
. Note: This will bring the config to the client side, it's currently a server-side-only config.
Topic Allow Listing for Partition-Level Reporting
...