Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: remove IBP bump need

...

Current stateAccepted

Discussion thread: here

JIRAJIRAs:

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-9705

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-10674

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

We are also going to add a new RPC type to wrap the original request during the forwarding. We will make corresponding changes to `ApiMessageTypeGenerator` class to recognize the new field `Header` and `ApiMessage` during the auto generation. And for authentication and audit logging purpose, we proposed to add the following fields:

  1. Serialized Request Data
  2. Initial Principal Name for audit logging and throttling purpose
  3. Id token for authentication purpose
    1. Request principal for authentication and authorization purpose
  4. Client hostname for authentication Initial Client Id for throttling purpose 


Code Block
languageyml
titleEnvelopeRequest.json
{
  "apiKey": N,
  "type": "request",
  "name": "EnvelopeRequest",
  "validVersions": "0",
  "flexibleVersions": "0+",
  "fields": [
    {     { "name":  "RequestData",  "type":  "ApiMessagebytes",  "versions":  "0+", "zeroCopy": true,
            "about":  "The embedded request header and data."},
      { "name": "InitialPrincipalNamePrincipalIdToken", "type": "stringbytes", "ignorabletag": true,
     0, "abouttaggedVersions": "0+"Optional value},
 of the initial principal{ "name when the request is redirected by a broker." },
": "RequestPrincipal", "type": "bytes", "versions": "0+", "zeroCopy": true,
     { "nameignorable": "InitialClientId"true, "typenullableVersions": "string0+", "ignorabledefault": true"null",
      "about": "OptionalValue value of the initial client idprincipal when the request is redirected by a broker." },
  ]
}

When receiving an EnvelopeRequest, the broker shall authorize the request with forwarding broker's principal. If the outer request is verified, the broker will continue to unwrap the inner request and handle it as normal, which means it would continue performing authorization for the inner layer principal. For KIP-590 scope, the possible top error codes are:

  • NOT_CONTROLLER as we are only forwarding admin write requests.
  • CLUSTER_AUTHORIZATION_FAILED if the inter-broker verification failed.

The CLUSTER authorization for EnvelopeRequest takes place during the request handling, similar to LeaderAndIsrRequest. This ensures the EnvelopeRequest is not sent from a malicious client pretending to be a fellow broker. For inner request error, it will still be embedded inside the `ResponseData` struct defined in EnvelopeResponse below.

Code Block
languageyml
titleEnvelopeResponse.json
{
  // Possible top level error code:
  //
  // NOT_CONTROLLER
  // CLUSTER_AUTHORIZATION_FAILED
  //
  "apiKey": N,
  "type": "response",
  "name": "EnvelopeResponse",
  "validVersions": "0",
  "flexibleVersions": "0+",
  "fields": [
    { "name": "ResponseData", "type": "ApiMessage", "versions": "0+",
      "about": "The embedded response data."},
    { "name": "ErrorCode", "type": "int16", "versions": "0+",
      "about": "The error code, or 0 if there was no error." },
  ]
}

EnvelopeResponse Handling

When the response contains NOT_CONTROLLER error code, the forwarding broker will keep finding the correct controller until request eventually times out. For CLUSTER_AUTHORIZATION_FAILED, this indicates an internal error for broker security setup which has nothing to do with the client, so we have no other way but returning an UNKNOWN_SERVER_ERROR to the admin client. 

For whatever result the controller replies to the inner request, the forwarding broker won't check. As long as the top level has no error, the forwarding broker will claim the request to be successful and reply the inner response to the admin client for the rest of error handling.

Routing Request Security

For ZK mutation requests that need redirection, forwarding broker will just use its own authorizer to verify the principals. When the request looks good, it will just forward the request as Envelope with its own credentials, so that the controller broker will only validate the broker principal in the forwarded request. The only exceptional case is the controller audit log which needs a principal name of the request, so we will add an optional field called "InitialPrincipalName" as stated in the Envelope template.

To better understand how security check works, take AlterConfig for an example, the intended workflow for a KIP-590 broker would be:

Step 1. Filter out resources that are authorized
         1.1 Use traditional principals to verify first
         1.2 If the resource is authorized, and if this is the active controller, process it
         1.3 Otherwise package the authorized resources and send to the active controller as Envelope
      

Step 2. Check the Envelope request to see if this is a forwarding request, by checking whether it sets initial principal fields and come from privileged listener
        2.1 Use CLUSTER_ACTION to verify, and if the resource is not authorized, return CLUSTER_AUTHORIZATION_FAILURE to propagate back to the original client through forwarding broker
        2.2 if the resource is authorized but this is not the active controller, return NOT_CONTROLLER to the sender (forwarding broker) for retry
        2.3 Process the resource

Step 3. Handle the returned EnvelopeResponse
        3.1 If the top level error code is NOT_CONTROLLER, retry until timeout
        3.2 If the error is CLUSTER_AUTHORIZATION_FAILURE, set top level or resource level error code in the original RPC response.                                                                                                                                    3.3 Merge with other unauthorized resource and return back to the admin client

As suggested in the above process, a new error code shall be implemented for internal authentication failure:

Code Block
languagejava
titleErrors.java
BROKER_AUTHORIZATION_FAILURE(92, "Authorization failed for the request during forwarding. This indicates an internal error on the broker cluster security setup.", BrokerAuthorizationFailureException::new);

Unfortunately for older admin clients they couldn't interpret this code, so an UNKNOWN_SERVER_ERROR will be presented, which is less ideal but still good enough to motivate users to check the broker side log for authorization failure. We intended to avoid returning AUTHORIZATION failure to the old client so that users don't waste time debugging any client side security setup.

To distinguish which request is forwarded, the controller will try to differentiate requests coming from inter broker listener and advertised listener. If the request is from inter broker listener, we treat it as a forwarding request and do the override authentication.

...

    { "name": "ClientHostName", "type": "string", "versions": "0+", "default": "",
      "about": "The original client's hostname." }
  ]
}

When receiving an EnvelopeRequest, the broker shall authorize the request with forwarding broker's principal. If the outer request is verified, the broker will continue to unwrap the inner request and handle it as normal, which means it would continue performing authorization for the inner layer principal. For KIP-590 scope, the possible top error codes are:

  • NOT_CONTROLLER as we are only forwarding admin write requests.
  • CLUSTER_AUTHORIZATION_FAILED if the inter-broker verification failed.

The CLUSTER authorization for EnvelopeRequest takes place during the request handling, similar to LeaderAndIsrRequest. This ensures the EnvelopeRequest is not sent from a malicious client pretending to be a fellow broker. For inner request error, it will still be embedded inside the `ResponseData` struct defined in EnvelopeResponse below.

Code Block
languageyml
titleEnvelopeResponse.json
{
  // Possible top level error code:
  //
  // NOT_CONTROLLER
  // CLUSTER_AUTHORIZATION_FAILED
  //
  "apiKey": N,
  "type": "response",
  "name": "EnvelopeResponse",
  "validVersions": "0",
  "flexibleVersions": "0+",
  "fields": [
    { "name": "ThrottleTimeMs", "type": "int32", "versions": "0+",
      "about": "The duration in milliseconds for which the request was throttled due to a quota violation, or zero if the request did not violate any quota." },
    { "name": "ResponseData", "type": "bytes", "versions": "0+", "nullableVersions": "0+",
      "zeroCopy": true, "default": "null",
      "about": "The embedded response header and data."},
    { "name": "ErrorCode", "type": "int16", "versions": "0+",
      "about": "The error code, or 0 if there was no error." }
  ]
}

EnvelopeResponse Handling

When the response contains NOT_CONTROLLER error code, the forwarding broker will keep finding the correct controller until request eventually times out. For CLUSTER_AUTHORIZATION_FAILED, this indicates an internal error for broker security setup which has nothing to do with the client, so we have no other way but returning an UNKNOWN_SERVER_ERROR to the admin client. 

For whatever result the controller replies to the inner request, the forwarding broker won't check. As long as the top level has no error, the forwarding broker will claim the request to be successful and reply the inner response to the admin client for the rest of error handling.

Routing Request Security

For ZK mutation requests that need redirection, forwarding broker will just use its own authorizer to verify the principals. When the request looks good, it will just forward the request as Envelope with its own credentials, so that the controller broker will only validate the broker principal in the forwarded request. The only exceptional case is the controller audit log which needs a principal name of the request, so we will add an optional field called "InitialPrincipalName" as stated in the Envelope template.

To better understand how security check works, take AlterConfig for an example, the intended workflow for a KIP-590 broker would be:

Step 1. Filter out resources that are authorized
         1.1 Use traditional principals to verify first
         1.2 If the resource is authorized, and if this is the active controller, process it
         1.3 Otherwise package the authorized resources and send to the active controller as Envelope

Step 2. Check the Envelope request to see if this is a forwarding request, by checking whether it sets initial principal fields and come from privileged listener
        2.1 Use CLUSTER_ACTION to verify, and if the resource is not authorized, return CLUSTER_AUTHORIZATION_FAILURE to propagate back to the original client through forwarding broker
        2.2 if the resource is authorized but this is not the active controller, return NOT_CONTROLLER to the sender (forwarding broker) for retry
        2.3 Process the resource

Step 3. Handle the returned EnvelopeResponse

        3.1 If the top level error code is NOT_CONTROLLER, retry until timeout
        3.2 If the error is CLUSTER_AUTHORIZATION_FAILURE, set top level or resource level error code in the original RPC response.
        3.3 Merge with other unauthorized resource and return back to the admin client
   

As suggested in the above process, two new error codes shall be implemented for internal authentication failure:

Code Block
languagejava
titleErrors.java
BROKER_AUTHORIZATION_FAILURE(92, "Authorization failed for the request during forwarding. This indicates an internal error on the broker cluster security setup.", BrokerAuthorizationFailureException::new);
PRINCIPAL_DESERIALIZATION_FAILURE(93, "Request principal deserialization failed during forwarding. " +
"This indicates an internal error on the broker cluster security setup.", PrincipalDeserializationFailureException::new)

Unfortunately for older admin clients they couldn't interpret this code, so an UNKNOWN_SERVER_ERROR will be presented, which is less ideal but still good enough to motivate users to check the broker side log for authorization failure. We intended to avoid returning AUTHORIZATION failure to the old client so that users don't waste time debugging any client side security setup.

To distinguish which request is forwarded, the controller will try to differentiate requests coming from inter broker listener and advertised listener. If the request is from inter broker listener, we treat it as a forwarding request and do the override authentication.

Although some users may configure the same listener name for both client and inter broker communication, which invalidates the differentiation process, this override approach still guarantees no extra security access breach since CLUSTER_ACTION implies either the broker or a super user.

Principal Serialization

In Kafka, principals are represented by the KafkaPrincipal type. Users are allowed to provide their own extensions through the use of a KafkaPrincipalBuilder. Extensions may include additional fields (such as a tenant ID), so we need a new mechanism to serialize and deserialize a principal. For this, we will use a new KafkaPrincipalSerde type:

Code Block
languagejava
titleKafkaPrincipalSerde.java
interface KafkaPrincipalSerde {
  byte[] serialize(KafkaPrincipal principal);
  KafkaPrincipal deserialize(byte[] bytes);
}

Until 3.0, this type will be optional. If it is not implemented by the KafkaPrincipalBuilder type, then the broker will log a warning when the broker starts up, thus disabling the redirection as well. After 3.0, it will be required and the broker will not start without it.


Some users may not want to allow impersonation of some APIs even going beyond the limited set of supported APIs. For example, a user might prefer to allow ACL changes only from within a private network. For this use case, we extend the Authorizer so that it can distinguish impersonated requests. Specifically, we propose to add the principal that is forwarding the request to be included in the authorization context:

Code Block
languagejava
titleAuhtorizableRequestContext.java
public interface AuthorizableRequestContext {
  Optional<KafkaPrincipal> forwardingPrincipal();
}

An Authorizer can reject impersonated requests by checking if the forwarding principal is present. This information is obviously useful for auditing as well.

ApiVersion Consistency

Admin clients send ApiVersions to the broker upon the first connection establishes. The tricky thing after forwarding is enabled is that for forwardable APIs, admin client needs to know a commonly-agreed rang of ApiVersions among handling broker, active controller and itself.

Right now the inter-broker APIs are guaranteed by IBP constraints, but not for forwardable APIs. A compromised solution would be to put all forwardable APIs under IBP, which is brittle and hard to maintain consistency.

Instead, any broker connecting to the active controller should send an ApiVersion request from beginning, so it is easy to compute that information and send back to the admin clients upon ApiVersion request from admin.  Any rolling of the active controller will trigger reconnection between broker and controller, which guarantees a refreshed ApiVersions between the two. This approach avoids the tight bond with IBP and broker could just close the connection between admin client to trigger retry logic and refreshing of the ApiVersions. Since this failure should be rare, two round-trips and timeout delays are well compensated by the less engineering work.

Routing in KIP-500 

In addition, to avoid exposing this forwarding power to the admin clients, the routing request shall be forwarded towards the controller broker internal endpoint which should be only visible to other brokers inside the cluster in the KIP-500 controller. Any admin configuration request with broker principal should not be going through the public endpoint and will be rejected for security purpose. For pre-KIP-500 controller, we would allow broker principal to go through only when the message comes in on the inter-broker listener, which is an indication of a forwarding request. The pre-KIP-500 cluster could not fully prevent malicious client pretending to be a forwarding request, but the attacker must have super user access to gain CLUSTER_ACTION.

...