Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This whole update strategy change would be applied to all the direct ZK mutation paths, including:

  • AlterConfig

  • IncrementalAlterConfig 

  • CreateAcls

  • DeleteAcls

  • AlterClientQuotas

  • CreateDelegationToken

...

  • RenewDelegationToken

  • ExpireDelegationToken

Internal CreateTopicsRequest Routing 

...

In addition, to avoid exposing this forwarding power to the admin clients, the routing request shall be forwarded towards the controller broker internal endpoint which should be only visible to other brokers inside the cluster. Any admin configuration request with broker principal should not be going through the public endpoint and will be rejected for security purpose.

Public Interfaces

Protocol Bumps

We are going to bump all mentioned mutation APIs above by one version, and new admin client was expected to only talk to the controller. For example we bump the AlterConfig API to v2.

Code Block
titleAlterConfigRequest.json
{
  "apiKey": 44,
  "type": "request",
  "name": "IncrementalAlterConfigsRequest",
  // Version 1 is the first flexible version. For new binary deploy, this should always be forwarded to the controller.
  //
  // Version 2 the request shall always route to the controller.
  "validVersions": "0-2",
  "flexibleVersions": "1+",
   "fields": [
    { "name": "Resources", "type": "[]AlterConfigsResource", "versions": "0+",
      "about": "The incremental updates for each resource.", "fields": [
      { "name": "ResourceType", "type": "int8", "versions": "0+", "mapKey": true,
        "about": "The resource type." },
      { "name": "ResourceName", "type": "string", "versions": "0+", "mapKey": true,
        "about": "The resource name." },
      { "name": "Configs", "type": "[]AlterableConfig", "versions": "0+",
        "about": "The configurations.",  "fields": [
        { "name": "Name", "type": "string", "versions": "0+", "mapKey": true,
          "about": "The configuration key name." },
        { "name": "ConfigOperation", "type": "int8", "versions": "0+", "mapKey": true,
          "about": "The type (Set, Delete, Append, Subtract) of operation." },
        { "name": "Value", "type": "string", "versions": "0+", "nullableVersions": "0+",
          "about": "The value to set for the configuration key."}
      ]}
    ]},
}

If the request is v1, broker with new AlterConfig protocol enabled should always proxy the request to the controller. If the request is v2, the recipient broker should handle it or respond a NOT_CONTROLLER to ask for rediscovery if it is not the controller.

Same applies for all other mentioned requests as well. The new version request should always go to the controller. If the request is on an older version, broker shall redirect it to the controller.

  • AlterConfig to v2
  • IncrementalAlterConfig to v2
  • CreateAcls to v3
  • DeleteAcls to v3
  • AlterClientQuotas to v1
  • CreateDelegationToken to v3
  • RenewDelegationToken to v3
  • ExpireDelegationToken to v3

The CreateTopic routing change is purely inter-broker. Since the CreateTopicRequest is already handled by controller only, so no change on this side.

Deprecate Client Side Controller Access 

Starting from the first release version of KIP-590,  the following RPCs shall be forwarded to the controller:

  • AlterPartitionReassignment
  • CreatePartition
  • CreateTopics
  • DeleteTopics
  • UpdateFeatures (KIP-584)

And they would follow the same configuration request forwarding strategy discussed in the previous section.

The reason is that we shall remove "ControllerNodeProvider" on the admin client, so that clients no longer have direct access towards the controller. Thus the active controller is properly isolated from the outside world, according to KIP-631.

Protocol Bump

We also need to bump the Metadata RPC to v10 to propagate internal topic creation policy violation. Specifically:

1. For newer clients, return POLICY_VIOLATION when the topic creation policy is violated. In the application level, we should swap the error message with the actual failure reason such as "violation of topic creation policy when attempting to auto create internal topic through MetadataRequest."

2. For older client, return AUTHORIZATION_FAILED to fail the client quickly as well. It's not a perfect solution as we don't have a notification path for older clients, but at least the system admin could check for broker log when hitting this issue.

Security Access Changes

Broker Authorization Override During Forwarding

To support the authorization of RPCs during redirection, we would let CLUSTER_ACTION to override the following operation principals:

Operation

Resource

API

ALTER
Cluster

CreateAcls/DeleteAcls/AlterPartitionReassignments/UpdateFeatures

ALTER
TopicCreatePartitions
ALTER_CONFIGS

Topic/Cluster

AlterConfig/IncrementalAlterConfig

CREATE

Topic

CreateTopics

token authentication

token

Create/Renew/DeleteToken

DELETE
TopicDeleteTopics

This ensures that the forwarding broker could use its own principal to authenticate and proceed on certain ZK mutation operations. To distinguish which request is forwarded, the controller will try to differentiate requests coming from inter broker listener and advertised listener. If the request is from inter broker listener, we treat it as a forwarding request and do the override authentication.

Although some users may configure the same listener name for both client and inter broker communication, which invalidates the differentiation process, this override approach still guarantees no extra security access breach since CLUSTER_ACTION implies either the broker or a super user.

If the authorization still fails on the controller side, it indicates an internal security setup error which should be addressed on the broker cluster, not the client. We shall propagate a new error code to the original client to educate users to fix:

Code Block
languagejava
titleErrors.java
BROKER_AUTHORIZATION_FAILURE(92, "Authorization failed for the request during forwarding, this indicates an internal error on the broker cluster security setup.", BrokerAuthorizationFailureException::new);

Unfortunately for older admin clients they couldn't interpret this code, so an UNKNOWN_SERVER_ERROR will be presented, which is less ideal but still good enough to motivate users on checking the broker side log for authorization failure. We intended to avoid returning AUTHORIZATION failure to the old client so that users don't waste time debugging any client side security setup.

New Tag for Principal Name

We are also going to add a tag field to represent the original request principal name to the request header for controller audit log purpose.

...