Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Operators of Apache Kafka clusters have literally no little information about the type of clients connected to their clusters besides the `clientId`. Having basic more information about the connected clients such as their software name and their version could tremendously help them to (1) troubleshoot misbehaving clients; or and (2) understand the impact of a broker upgrade to their clients and reach them out to inform them proactively.

Public Interfaces

...

ApiVersionsRequest is bumped to version 3 with two new fields. ApiVersionsRequest version is a flexible version (KIP-482: The Kafka Protocol should Support Optional Tagged Fields).

Code Block
languagejs
{
  "apiKey": 18,
  "type": "request",
  "name": "ApiVersionsRequest",
  "validVersions": "0-3",
  "flexibleVersions": "3+",
  // Versions 0 through 2 of ApiVersionsRequest are the same.
  // Starting in version 3, ClientNameVersion 3 is the first flexible version and ClientVersionadds ClientSoftwareName areand presentClientSoftwareVersion. 
  "fields": [
	{"name": "ClientNameClientSoftwareName", "type": "string", "versions": "3+", "about": "The name of the client."},
	{"name": "ClientVersionClientSoftwareVersion", "type": "string", "versions": "3+", "about": "The version of the client."}
  ]
}

ApiVersionsResponse is bumped to version 3 but does not have any changes in the schema. Note that ApiVersionsResponse is not a flexible version. This is necessary because the client must look at a fixed offset to find the error code, regardless of the response version, to remain backward compatible.

Code Block
languagejs
{
  "apiKey": 18,
  "type": "response",
  "name": "ApiVersionsResponse",
  // Version 1 adds throttle time to the response.
  // Starting in version 2, on quota violation, brokers send out responses before throttling.
  // Version 3 is similarthe same toas version 2. 
  "validVersions": "0-3",
  "flexibleVersions": "none",
  "fields": [
    { "name": "ErrorCode", "type": "int16", "versions": "0+",
      "about": "The top-level error code." },
    { "name": "ApiKeys", "type": "[]ApiVersionsResponseKey", "versions": "0+",
      "about": "The APIs supported by the broker.", "fields": [
      { "name": "Index", "type": "int16", "versions": "0+", "mapKey": true,
        "about": "The API index." },
      { "name": "MinVersion", "type": "int16", "versions": "0+",
        "about": "The minimum supported version, inclusive." },
      { "name": "MaxVersion", "type": "int16", "versions": "0+",
        "about": "The maximum supported version, inclusive." }
    ]},
    { "name": "ThrottleTimeMs", "type": "int32", "versions": "1+", "ignorable": true,
      "about": "The duration in milliseconds for which the request was throttled due to a quota violation, or zero if the request did not violate any quota." }
  ]
}

`Errors.INVALID_REQUEST` is added.

Code Block
public enum Errors {
    ...
    INVALID_REQUEST(XX, "The validation of the request has failed.", InvalidRequestException::new);
    ...
}
 
public class InvalidRequestException extends ApiException {
    public InvalidRequestException(String message) {
        super(message);
    }
}

Metrics

We will add few metrics in the broker to surface information about the connected clients.

MetricTypeDescriptionCan be plotted?
kafka.server:type=ClientMetrics,name=ConnectedClientsGauge<Integer>The total number of client connected.Yes
kafka.server:type=ClientMetrics,name=ConnectedClients,clientnamesoftwarename=([\.\-_a-zA-Z0-9])+,clientversionsoftwareversion=([\.\-_a-zA-Z0-9])+Gauge<Integer>The number of client connected, broken down by clientname softwarename and clientversionsoftwareversion. It gives an overview of the clients.
The metric will be removed when it goes back to zero - when the all the clients with a given name and version are disconnected.
Yes
kafka.server:type=ClientMetrics,name=ConnectionsGauge<List<Map<String, String>>

The clients connected to the broker where each Map represents a connection with the following metadata:

  • ClientId
  • ClientNameClientSoftwareName
  • ClientVersionClientSoftwareVersion
  • ClientAddress
  • Principal
  • Listener
  • SecurityProtocol
No - Operator can get the active connections via JMX by using a tool such as jmxterm

...

Code Block
languagetext
[2019-07-02 14:11:16,137] DEBUG Completed request:RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=2, clientId=consumer-1, correlationId=11) -- {coordinator_key=console-consumer-17661,coordinator_type=0},response:{throttle_time_ms=0,error_code=15,error_message=null,coordinator={node_id=-1,host=,port=-1}} from connection 192.168.12.241:9092-192.168.12.241:52149-3;totalTime:3.187,requestQueueTime:0.137,localTime:2.899,remoteTime:0.0,throttleTime:0.098,responseQueueTime:0.048,sendTime:0.124,securityProtocol:PLAINTEXT,principal:User:ANONYMOUS,listener:PLAINTEXT,clientNameclientSoftwareName:java,clientVersionclientSoftwareVersion:2.2.0 (kafka.request.logger)

...

The client does not know which ApiVersions versions the broker supports as the ApiVersions is used for this purpose. Today, the client sends an ApiVersionsRequest (AVR) with the latest schema it is aware of. The broker handles it with the correct version if it knows it or sends back an ApiVersionsResponse v0 with an `UNSUPPORTED_VERSION` error to the client if it doesn't. When the client receives such error, it retries the whole process with the ApiVersionsRequest v0. It means that the broker won't get any additional information about the client if the client uses a newer version is used that it does not that the broker doesn't know about. To circumvent this, we propose to add provide the supported version of the ApiVersionsRequest in the response sent back to the client alongside the error. The client will be able to leverage this version to send back the correct ApiVersionsRequest instead of defaulting by populating the existing `api_versions` field when the version is not supported (`UNSUPPORTED_VERSION`). This allows enables the client to send the latest version supported by the broker instead of failing all the way down to version 0.

At the moment, the ApiVersionsRequest is handled in two different places in the broker: 1) in the SaslServerAuthenticator (when used); and 2) in the KafkaApis. Both places will be updated to ensure that all clients work. We have decided to not refactor the handling of the ApiVersionsRequest for now and to leave it for further improvements.  

...

We propose to validate the client name and the client version with the following regular expression: ([\.\-_a-zA-Z0-9])+, and to close the connection and log the error if they are not valid. The validation may sound brutal but as the metadata are fixed in the client, the error should only happen during the development of the client. The `INVALID_REQUEST` error is returned to the client if the validation fails. When the client receives an `INVALID_REQUEST`, it must error out and close the connection.

Metrics & Log

The various metrics described above will be created based on the metadata available in the connection registry. Metrics will be removed when they are inactive (gauge equals to zero). The request log will be extended to include the metadata collected.

...

When SASL is used, the (Java) client sends two ApiVersionsRequest to the broker. The first one is sent by the SaslClientAuthenticator and the second one is send by the NetworkClient when the KafkaChannel is established. The SaslClientAuthenticator always sends version 0 of the AVR. We have decided to not change this for now and to only update the second call which always happens. The reasoning behind this choice is to avoid multiplying the round trip when an unknown version is used by the client, version 0 always works.

...

ClientSoftwareName and

...

ClientSoftwareVersion

The client uses the version provided in the `kafka/kafka-version.properties` file and the name `apache-kafka-java`.

...

Existing users extracting and parsing the Request Log may have to update their parsing logic to accommodate the new fields.

Rejected Alternatives

...

Put ClientSoftwareName and ClientSoftwareVersion in the RequestHeader

clientName ClientSoftwareName and clientVersion ClientSoftwareVersion could be sent in every request alongside to the clientId in the header. While this would be fairly simple to implement once KIP-482 is implemented, we believe it is not suitable if we want to collect more information in the future and would wast few bytes in every request for something which does not change within a session. It also makes the error handling weird as a request could be rejected due to its headers. Another issue is that we haven't found a way to evolve the header of the ApiVersionsRequest/Response ApiVersionsResponse to support tagged fields.

Put

...

ClientSoftwareName and

...

ClientSoftwareVersion in the RequestHeader but provide it only once

clientName and clientVersion ClientSoftwareVersion could be added to the RequestHeader but sent only in the first request to save bytes in the subsequent requests. The best would be to have it in the ApiVersionsRequest's header but it is impossible (see previous point). It would be weird to have the information in random requests and could make clients inconsistent.

...