Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Simplify metrics

...

To effectively monitor the admin request forwarding status, we would add two metrics: num-client-forwarding-to-controller-rate and num-client-fowarding-to-controller-count to visualize the progress. They are set up to track the number of clients being on a pending forward request for each broker.

In terms of the usage for Envelope RPC, we could expect that not all forwarding requests are necessarily due to the old admin client or even to the controller. To better track the forwarding load, we shall also monitor the total message being redirected as metrics num-messages-redirected-rate and num-messages-redirected-count.

We could add an alerting metrics called request-forwarding-to-controller-authorization-fail-count in an effort to help administrator detect wrong security setup sooner. There should already be metrics monitoring request failures, so this metric could be optional.

In the long term after bridge release, we could potentially perform some incompatible changes to the Raft Quorum. To monitor the old client connections to better capture the timing for a major version bump,  KIP-511 also has already exposed metrics like an "unknown" software name and an "unknown" software version.

Compatibility, Deprecation, and Migration Plan

The upgrade path shall be guarded by the inter.broker.protocol (IBP) to make sure the routing behavior is consistent. After first rolling bounce to upgrade the binary version, all fellow brokers are still handling ZK mutation requests by themselves. With the second IBP bump rolling bounce, all upgraded brokers will be using the new routing algorithm effectively described in this KIP.

As we discussed in the request routing section, to work with an older client, the first contacted broker need to act as a proxy to redirect the write request to the controller. To support the proxy of requests, we need to build a channel for brokers to talk directly to the controller. This part of the design is internal change only and won’t block the KIP progress.

Rejected Alternatives

MBean:kafka.server:type=RequestMetrics,name=NumRequestForwardingToControllerPerSec,clientId=([-.\w]+)
MBean:kafka.server:type=RequestMetrics,name=NumRequestForwardingToControllerTotal,clientId=([-.\w]+)

to visualize how many RPC are inflight from each admin client. They will be added via Yammer metrics.

Compatibility, Deprecation, and Migration Plan

The upgrade path shall be guarded by the inter.broker.protocol (IBP) to make sure the routing behavior is consistent. After first rolling bounce to upgrade the binary version, all fellow brokers are still handling ZK mutation requests by themselves. With the second IBP bump rolling bounce, all upgraded brokers will be using the new routing algorithm effectively described in this KIP.

As we discussed in the request routing section, to work with an older client, the first contacted broker need to act as a proxy to redirect the write request to the controller. To support the proxy of requests, we need to build a channel for brokers to talk directly to the controller. This part of the design is internal change only and won’t block the KIP progress.

Rejected Alternatives

  • We discussed about the possibility of immediately building a metadata topic to propagate the changes. This seems aligned with the eventual metadata quorum path, but at a cost of blocking the current API migration towards the bridge release, since the metadata quorum design is much more complicated and requires more iterations. To avoid this extra dependency on other tracks, we should go ahead and migrate existing protocols to meet the bridge release goal sooner.
  • We thought about adding an alerting metrics called request-forwarding-to-controller-authorization-fail-count in an effort to help administrator detect wrong security setup sooner. However, there should already be metrics monitoring request failures, so this metric could be optional.

  • We thought about monitoring older client connections in the long term after bridge release, when we perform some incompatible changes to the Raft Quorum, to better capture the timing for a major version bump. However, KIP-511 also has already exposed metrics like an "unknown" software name and an "unknown" software version which could serve for this purpose

    We discussed about the possibility of immediately building a metadata topic to propagate the changes. This seems aligned with the eventual metadata quorum path, but at a cost of blocking the current API migration towards the bridge release, since the metadata quorum design is much more complicated and requires more iterations. To avoid this extra dependency on other tracks, we should go ahead and migrate existing protocols to meet the bridge release goal sooner

    .

  • We also had an almost complete proposal around forwarding request. Just keep it here for future reference. Although the Envelope API provides certain privileges like data embedding and principal embedding, it creates a security hole by letting a malicious user impersonate any forwarding broker. Passing the principal around also increases the vulnerability, compared with other standard ways such as passing a verified token, but it is unfortunately not fully supported with Kafka security. So for the security concerns, we are abandoning the Envelope approach and fallback to just forward the raw admin requests.

...