Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: address Colin's suggestion

 EnvelopeRequest handling

Table of Contents

...

Parent KIP

KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum (Accepted)

...

One thing to note that at the moment the direct ZK access bypasses the CreateTopicPolicy. To maintain the same guarantee, we would add a whitelist an allow list to broker for the two internal topics to bypass topic policy:

...

For older requests that need redirection, forwarding broker will just use its own authorizer to verify the principals. When the request looks good, it will just forward the request with its own credentials, no second validation neededso that the controller broker will only validate the broker principal in the forwarded request. The only exceptional case is the controller audit log which needs a principal name of the request, so we will add an optional tag called "PrincipalName" to the header when sending the proxy request.

Public Interfaces

Protocol Bumps

We are going to bump all mentioned mutation APIs above by one version, and new admin client was expected to only talk to the controller. For example we bump the AlterConfig API to v2.

In addition, to avoid exposing this forwarding power to the admin clients, the routing request shall be forwarded towards the controller broker internal endpoint which should be only visible to other brokers inside the cluster. Any request with broker principal should not be going through the public endpoint and will be rejected for security purpose.

Public Interfaces

Protocol Bumps

We are going to bump all mentioned mutation APIs above by one version, and new admin client was expected to only talk to the controller. For example we bump the AlterConfig API to v2.

Code Block
titleAlterConfigRequest.
Code Block
titleAlterConfigRequest.json
{
  "apiKey": 44,
  "type": "request",
  "name": "IncrementalAlterConfigsRequest",
  // Version 1 is the first flexible version. For new binary deploy, this should always be forwarded to the controller.
  //
  // Version 2 the request shall always route to the controller.
  "validVersions": "0-2",
  "flexibleVersions": "1+",
   "fields": [
    { "name": "Resources", "type": "[]AlterConfigsResource", "versions": "0+",
      "about": "The incremental updates for each resource.", "fields": [
      { "name": "ResourceType", "type": "int8", "versions": "0+", "mapKey": true,
        "about": "The resource type." },
      { "name": "ResourceName", "type": "string", "versions": "0+", "mapKey": true,
        "about": "The resource name." },
      { "name": "Configs", "type": "[]AlterableConfig", "versions": "0+",
        "about": "The configurations.",  "fields": [
        { "name": "Name", "type": "string", "versions": "0+", "mapKey": true,
          "about": "The configuration key name." },
        { "name": "ConfigOperation", "type": "int8", "versions": "0+", "mapKey": true,
          "about": "The type (Set, Delete, Append, Subtract) of operation." },
        { "name": "Value", "type": "string", "versions": "0+", "nullableVersions": "0+",
          "about": "The value to set for the configuration key."}
      ]}
    ]},
}

...

Code Block
languagejava
titleRequestHander.json
{
  "type": "header",
  "name": "RequestHeader",
  // Version 0 of the RequestHeader is only used by v0 of ControlledShutdownRequest.
  //
  // Version 1 is the first version with ClientId.
  //
  // Version 2 is the first flexible version.
  "validVersions": "0-2",
  "flexibleVersions": "2+",
  "fields": [
    { "name": "RequestApiKey", "type": "int16", "versions": "0+",
      "about": "The API key of this request." },
    { "name": "RequestApiVersion", "type": "int16", "versions": "0+",
      "about": "The API version of this request." },
    { "name": "CorrelationId", "type": "int32", "versions": "0+",
      "about": "The correlation ID of this request." },
    ...
    // ----- new optional field ----
    { "name": "PrincipalName", "type": "string", "tag": 0, "taggedVersions": "2+", "ignorable": true,
      "about": "Optional value of the principal name when the request is redirected by a broker." },
    // ----- end new field ---------
  ]
}

...

Monitoring Metrics

To maintain the same level of security going along in the post-ZK world, the broker-controller communication should have extra security guarantee. To make that happen, we will introduce a separate `ControllerEndpoint` for user to configure the exclusive access of forwarding requests to only go through this tunnel. Getting a separate communication channel also helps differentiating whether the request is from admin client or forwarded, which means the forwarding brokers don't have to bump the request version unnecessarily.

This part of the design is dependent on the Controller refactoring effort, and more details shall reveal for subsequent KIPs. It won't block the acceptance for this KIP either, since the forwarding behavior shall be the same. 

Monitoring Metrics

To effectively monitor the admin request forwarding status, we would the following metered metric:

MBean:kafka.server:type=RequestMetrics,name=NumRequestsForwardingToControllerPerSec,clientId=([-.\w]+)

to visualize how many RPC are inflight from each admin client. It will be added via Yammer metrics.

Compatibility, Deprecation, and Migration Plan

The upgrade path shall be guarded by the inter.broker.protocol (IBP) to make sure the routing behavior is consistent. After first rolling bounce to upgrade the binary version, all fellow brokers are still handling ZK mutation requests by themselves. With the second IBP bump rolling bounce, all upgraded brokers will be using the new routing algorithm effectively described in this KIP.

As we discussed in the request routing section, to work with an older client, the first contacted broker need to act as a proxy to redirect the write request to the controller. To support the proxy of requests, we need to build a channel for brokers to talk directly to the controller. This part of the design is internal change only and won’t block the KIP progress.

Rejected Alternatives

  • We discussed about the possibility of immediately building a metadata topic to propagate the changes. This seems aligned with the eventual metadata quorum path, but at a cost of blocking the current API migration towards the bridge release, since the metadata quorum design is much more complicated and requires more iterations. To avoid this extra dependency on other tracks, we should go ahead and migrate existing protocols to meet the bridge release goal sooner.
  • We thought about adding an alerting metrics called request-forwarding-to-controller-authorization-fail-count in an effort to help administrator detect wrong security setup sooner. However, there should already be metrics monitoring request failures, so this metric could be optional.

  • We thought about monitoring older client connections in the long term after bridge release, when we perform some incompatible changes to the Raft Quorum, to better capture the timing for a major version bump. However, KIP-511 also has already exposed metrics like an "unknown" software name and an "unknown" software version which could serve for this purpose.

  • We also had an almost complete proposal around forwarding request. Just keep it here for future reference. Although the Envelope API provides certain privileges like data embedding and principal embedding, it creates a security hole by letting a malicious user impersonate any forwarding broker. Passing the principal around also increases the vulnerability, compared with other standard ways such as passing a verified token, but it is unfortunately not fully supported with Kafka security. So for the security concerns, we are abandoning the Envelope approach and fallback to just forward the raw admin requests.

==================================================== Start Old Proposal  ======================================================== 

New Envelope RPC

We are also going to add a new RPC type to wrap the original request during the forwarding. We will make corresponding changes to `ApiMessageTypeGenerator` class to recognize the new field `Header` and `ApiMessage` during the auto generation. This request will be fully wrapping an older version client request, including its header, security information, actual data fields, etc. The request requires ClusterAction on CLUSTER.

Routing Request Security

For audit logging purpose, we proposed to add the following fields:

  1. Serialized Principal information
  2. Client host ip address
  3. Listener name
  4. Security protocol being used
Code Block
titleEnvelopeRequest.json
{
  "apiKey": N,
  "type": "request",
  "name": "EnvelopeRequest",
  "validVersions": "0",
  "flexibleVersions": "0+",
  "fields": [
    { "name": "RequestHeader", "type": "Header", "versions": "0+",
      "about": "The embedded request header." },
	{ "name": "RequestData", "type": "ApiMessage", "versions": "0+",
      "about": "The embedded request data."},
	{ "name": "PrincipalInfo", "type": "bytes", "versions": "0+",
      "about": "The serialized principal information."},	
    { "name": "ClientHostIP", "type": "string", "versions": "0+"}, 
    { "name": "ListenerName", "type": "string", "versions": "0+"},
    { "name": "SecurityProtocol", "type": "string", "versions": "0+"}
  ]
}

EnvelopeRequest Handling

When receiving an EnvelopeRequest, the broker shall authorize the request with forwarding broker's principal. If the outer request is verified, the broker will continue to unwrap the inner request and handle it as normal, which means it would continue performing authorization for the inner layer principal. For KIP-590 scope, the possible top error codes are:

  • NOT_CONTROLLER as we are only forwarding admin write requests.
  • CLUSTER_AUTHORIZATION_FAILED if the inter-broker verification failed.

The CLUSTER authorization for EnvelopeRequest takes place during the request handling, similar to LeaderAndIsrRequest. This ensures the EnvelopeRequest is not sent from a malicious client pretending to be a fellow broker. For inner request error, it will still be embedded inside the `ResponseData` struct defined in EnvelopeResponse below.

Code Block
titleEnvelopeResponse.json
{
  // Possible top level error code:
  //
  // NOT_CONTROLLER
  // CLUSTER_AUTHORIZATION_FAILED
  //
  "apiKey": N,
  "type": "response",
  "name": "EnvelopeResponse",
  "validVersions": "0",
  "flexibleVersions": "0+",
  "fields": [
    { "name": "ResponseHeader", "type": "Header", "versions": "0+",
      "about": "The embedded response header." },
	{ "name": "ResponseData", "type": "ApiMessage", "versions": "0+",
      "about": "The embedded response data."},
    { "name": "ErrorCode", "type": "int16", "versions": "0+",
      "about": "The error code, or 0 if there was no error." },
  ]
}

EnvelopeResponse Handling

When the response contains NOT_CONTROLLER error code, the forwarding broker will keep finding the correct controller until request eventually times out. For CLUSTER_AUTHORIZATION_FAILED, this indicates an internal error for broker security setup which has nothing to do with the client, so we have no other way but returning an UNKNOWN_SERVER_ERROR to the admin client. 

For whatever result the controller replies to the inner request, the forwarding broker won't check. As long as the top level has no error, the forwarding broker will claim the request to be successful and reply the inner response to the admin client for the rest of error handling.

KafkaPrincipal Serialization

We shall also bring in a backward-incompatible change to add serializability to the KafkaPrincipalBuilder class:

Code Block
titleKafkaPrincipalBuilder.java
public interface KafkaPrincipalBuilder {
    ...
	ByteBuffer serialize();

	KafkaPrincipal deserialize(ByteBuffer serializedPrincipal);
}

which requires a non-default implementation to ensure the principal information gets properly compacted into the forwarding request.

Compatibility Breakage

The KafkaPrincipal builder serializability is a binary incompatible change stated in the KIP. For the smooth rollout of this KIP, we will defer this part of the implementation until we hit next major incompatible release, i.e Apache Kafka 3.0. This means we will breakdown the effort as:

  1. For next 2.x release:
    1. Get new admin client forwarding changes
    2. Get the Envelope RPC implementation
    3. Get the forwarding path working and validate the function with fake principals in testing environment, without actual triggering in the production system
  2. For next 3.0 release:
    1. Introduce serializability to PrincipalBuilder
    2. Turn on forwarding path in production and perform end-to-end testing

==================================================== End Old Proposal  ======================================================== 

Future Works

effectively monitor the admin request forwarding status, we would the following metered metric:

MBean:kafka.server:type=RequestMetrics,name=NumRequestsForwardingToControllerPerSec,clientId=([-.\w]+)

to visualize how many RPC are inflight from each admin client. It will be added via Yammer metrics.

Compatibility, Deprecation, and Migration Plan

The upgrade path shall be guarded by the inter.broker.protocol (IBP) to make sure the routing behavior is consistent. After first rolling bounce to upgrade the binary version, all fellow brokers are still handling ZK mutation requests by themselves. With the second IBP bump rolling bounce, all upgraded brokers will be using the new routing algorithm effectively described in this KIP.

As we discussed in the request routing section, to work with an older client, the first contacted broker need to act as a proxy to redirect the write request to the controller. To support the proxy of requests, we need to build a channel for brokers to talk directly to the controller. This part of the design is internal change only and won’t block the KIP progress.

Rejected Alternatives

  • We discussed about the possibility of immediately building a metadata topic to propagate the changes. This seems aligned with the eventual metadata quorum path, but at a cost of blocking the current API migration towards the bridge release, since the metadata quorum design is much more complicated and requires more iterations. To avoid this extra dependency on other tracks, we should go ahead and migrate existing protocols to meet the bridge release goal sooner.
  • We thought about adding an alerting metrics called request-forwarding-to-controller-authorization-fail-count in an effort to help administrator detect wrong security setup sooner. However, there should already be metrics monitoring request failures, so this metric could be optional.

  • We thought about monitoring older client connections in the long term after bridge release, when we perform some incompatible changes to the Raft Quorum, to better capture the timing for a major version bump. However, KIP-511 also has already exposed metrics like an "unknown" software name and an "unknown" software version which could serve for this purpose.

  • We discussed about adding a new RPC type called Envelope to wrap the original request during the forwarding. Although the Envelope API provides certain privileges like data embedding and principal embedding, it creates a security hole by letting a malicious user impersonate any forwarding broker. Passing the principal around also increases the vulnerability, compared with other standard ways such as passing a verified token, but it is unfortunately not fully supported with Kafka security. So for the security concerns, we are abandoning the Envelope approach and fallback to just forward the raw admin requests.

Future Works

We have also discussed about migrating the metadata read path to controller-only for read-after-write consistency. This sounds like a nice improvement but needs more discussions on trade-offs between overloading controller and the metadata consistency, also the progress of Raft quorum design as well.

New Secure Endpoint

To maintain the same level of security going along in the post-ZK world, the broker-controller communication should have extra security guarantee. To make that happen, we will introduce a separate `ControllerEndpoint` for user to configure the exclusive access of forwarding requests to only go through this tunnel. Getting a separate communication channel also helps differentiating whether the request is from admin client or forwarded, which means the forwarding brokers don't have to bump the request version unnecessarily.

This part of the design is dependent on the Controller refactoring effort, and more details shall reveal for subsequent KIPs. It won't block the acceptance for this KIP either, since the forwarding behavior shall be the same. We have also discussed about migrating the metadata read path to controller-only for read-after-write consistency. This sounds like a nice improvement but needs more discussions on trade-offs between overloading controller and the metadata consistency, also the progress of Raft quorum design as well.