Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Clarify throttling behaviour

...

Code Block
GetTelemetrySubscriptionsRequestV0 {
 	ClientInstanceId uuid    			// UUID4 unique for this client instance.
										// Must be set to Null on the first request, and to the
										// returned ClientInstanceId from the first response
										// for all subsequent requests to any broker.
}

GetTelemetrySubscriptionsResponseV0 {
 	ThrottleTime int32					// Standard throttling
	ErrorCode int16						// Error code
    ClientInstanceId uuid               // Assigned client instance id if ClientInstanceId was Null in the request, else Null.
    SubscriptionId int32                // Unique identifier for the current subscription set for this client instance.
    AcceptedCompressionTypes Array[int8] // The compression types the broker accepts for PushTelemetryRequest.CompressionType.
                                        // Calculated as a bitmask of (1 << MessageHeaderV2.Attributes.CompressionType).
 	PushIntervalMs int32				// Configured push interval, which is the lowest configured interval in the current subscription set.
    DeltaTemporality bool               // If True; monotonic/counter metrics are to be emitted as deltas to previous sample.
                                        // If False; monotonic/counter metrics are to be emitted as cumulative absolute values.
	RequestedMetrics Array[string]		// Requested Metrics prefix string match.
										// Empty array: No metrics subscribed.
										// Array[0] empty string: All metrics subscribed.
										// Array[..]: prefix string match

}

PushTelemetryRequestV0 {
	ClientInstanceId uuid    			// UUID4 unique for this client instance, as retrieved in the first GetTelemetrySubscriptionsRequest.
    SubscriptionId int32                // SubscriptionId from the GetTelemetrySubscriptionsResponse for the collected metrics.
	Terminating bool					// Client is terminating.
    CompressionType int8                // Compression codec used for .Metrics (ZSTD, LZ4, Snappy, GZIP, None).
                                        // Same values as that of the current MessageHeaderV2.Attributes.
	Metrics binary						// Format specified by ContentType, possibly compressed.
}

PushTelemetryResponseV0 {
	ThrottleTime int32					// Standard and metric-specific throttling
	ErrorCode int16						// Error code
}

...

Validation of the encoded metrics is the task of the ClientMetricsReceiver, if the compression type is unsupported the response will be returned with ErrorCode set to UnsupportedCompressionType. Should decoding or validation of the binary metrics blob fail the ErrorCode will be set to InvalidRecord.

...

Throttling and rate-limiting

There are two mechanisms at play to protect brokers from rogue or buggy clients that:

  1. Standard request throttling - will mute the client connection if user quotas (size and/or request rate) are exceeded.
  2. Metrics PushIntervalMs rate-limiting - ensures the client does not push telemetry more often than the configured PushIntervalMs (subscription interval). As this rate-limiting state is maintained by each broker the client is sending telemetry requests to it is possible for the client to send at most one accepted out-of-profile per connection before the rate-limiter kicks in. The metrics plugin itself may also put constraints on the maximum allowed metrics payload.

The receiving broker’s standard quota-based throttling should operate as usual for PushTelemetryRequest, but in addition to that the PushTelemetryRequest is also subject to rate-limiting based on the calculated next desired PushIntervalMs interval derived from the configured metrics subscriptions. Should the client send a push request prior to expiry of the previously calculated PushIntervalMs the broker will discard the metrics and return a PushTelemetryResponse with the ThrottleTime ErrorCode set to remaining PushIntervalMs timeTHROTTLING_QUOTA_EXCEEDED.

The one exception to this rule is when the client sets the PushTelemetryRequest.Terminating field to true indicating that the client is terminating, in this case the metrics should be accepted by the broker, but a consecutive request must ignore the Terminating field and apply rate-limiting as if the field was not set. The Terminating flag may be reused upon the next expiry of PushIntervalMs.

In case the cluster load induced from metrics requests becomes unmanageable the remedy is to temporarily remove or limit configured metrics subscriptions.  

Metrics subscription

Metrics subscriptions are configured through the standard Kafka Admin API configuration interface with the new resource-type CLIENT_METRICS, the resource-name is any string - it does not have significance to the metrics system other than to group metrics subscriptions in the configuration interface. The configuration is made up of the following ConfigEntry names:

...