Status

Current state: Under Discussion

Discussion thread:

JIRA:

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

The Kafka RPC protocol currently supports a single version number per message type. This number determines which version of the schema should be used when writing or reading the message. While this versioning scheme gives us the flexibility to add new fields to the schema over time, there are many scenarios that it doesn't support well.

One scenario that isn't well-supported is when we want to add data that should be included in some contexts, but not others. For example, when a MetadataRequest is made with IncludeClusterAuthorizedOperations set to true, we need to include the authorized operations in the response. However, even when IncludeClusterAuthorizedOperations is set to false, we still must use bandwidth sending the authorized operations fields in the response.

The assumption is that when a new version adds a field, that field will always be used going forward. This leads to cases where new features that are included in the request or response impose a CPU and network cost even if you don't use them. Even when a field is set to null, it can take up a fairly large amount of space when serialized. For example, a null array takes 4 bytes to serialize.

Another scenario is when we want to attach an extra field to a message in a manner that is orthogonal to the normal versioning scheme. For example, we might want to attach a trace ID, a "forwarded-by" field, or a "user-agent" field. It wouldn't make sense to add all these fields to the message schema on the off chance that someone might use them.

Public Interfaces

JSON Schemas

versionsWithOptional

Each Kafka RPC will have a new top-level version field named "versionsWithOptional". This field will contain a version range such as "1+", etc. All of the message versions in this range will support optional fields.

As part of this KIP, we will create a new version of all the existing RPCs. This new version will support optional fields.

Specifying Optional Fields

Optional fields can appear at the top level of a message, or inside any structure.

Each optional field has a 16-bit tag number. This number must be unique within the context it appears in. Any value from 0 to 65535, inclusive, is a legal tag.

Unlike mandatory fields, optional fields do not specify a version range or a nullable version range. Instead, optional fields are nullable if "nullable" is set to true. Just like with mandatory fields, only certain types of field can be nullable (array, string, bytes, etc.) Int and booleans cannot be nullable (just like with mandatory fields.)

Optional fields can have any type, except for an array of structures.

Here is an example of a message spec which has optional fields at both the top level and the array level:

{
  "apiKey": 9000,
  "type": "response",
  "name": "FooResponse",
  "validVersions": "0-9",
  "versionsWithOptional": "9+",
  "optionalFields": [
      { "name": "UserAgent", "type": "string", "nullable": true, "tag": "0x0001",
        "about": "The user-agent that sent this request." },
  ],
  "fields": [
    { "name": "Foos", "type": "[]Foo", "versions": "0+",
      "about": "Each foo.", "optionalFields": [
        { "name": "Bar", "type": "string", "nullable": false, "tag": "0x0001",
          "default": "hello world", "about": "The bar associated with this foo, if any." },
      ], "fields": [
        { "name": "Baz", "type": "int16", "versions": "0+",
          "about": "The baz associated with this foo." },
  ...
  ]
}

Schema Class

We will add a new constructor to the org.apache.kafka.common.protocol.types.Schema class which will support optional fields.

    /**
     * Construct the schema with a given list of its field values
     *
     * @param optionalFields   The optional fields for this schema.
     * @param fields           The mandatory fields of this schema.
     *
     * @throws SchemaException If the given list have duplicate fields
     */
    public Schema(Map<Short, Field> optionalFields, Field... fields);

Proposed Changes

Serializing Optional Fields

An "optional field buffer" contains a sequence of optional fields. The fields must appear in ascending order, from the lowest-valued tag to the highest-valued tag.

Each entry in the buffer contains a field length, followed by a two-byte tag, followed by the field itself.

Field	Type
Field Length	VARINT
Field Tag	INT16
Field value	<FIELD TYPE>

The sequence of optional fields is terminated by an entry with a field length of 0. The terminating entry will not contain a tag or value.

Requests and Responses

All requests and responses will begin with an optional field buffer. If there are no optional fields, this will only be a single zero byte.

Structures

All structures will begin with an optional field buffer. This will normally only be a single byte, unless there are optional fields present.

Compatibility, Deprecation, and Migration Plan

As mentioned earlier, existing request versions will not be changed to support optional fields. However, new versions will have this support going forward.

In general, adding or removing an optional field is always a compatible operation, provided that we don't reuse a tag that was used for something else in a previous release. Changing the type or nullability of an existing optional field is also an incompatible change.

Rejected Alternatives

Optional Field Buffer Serialization Alternatives

We could serialize optional fields as a tag and a type, rather than a tag and a length. However, this would prevent us from adding new types in the future, since the old deserializers would not understand them.
We could allow the serialization of arrays of objects. However, this would require a two-pass serialization rather than a single-pass serialization. The first pass would have to cache the lengths of all the optional object arrays. We might support this eventually, but for now, it doesn't seem necessary. We can add it later in a compatible fashion if we decide to.

Make all Fields Optional

Rather than supporting both mandatory and optional fields, we could make all fields optional. For fields which we always expect to use, however, this would take more space when serialized. There are also situations where it is useful for the recipient to know which features the sender supports, and the mandatory field mechanism handles these situations well.

Space shortcuts

Child pages

Status

Motivation

Public Interfaces

JSON Schemas

versionsWithOptional

Specifying Optional Fields

Schema Class

Proposed Changes

Serializing Optional Fields

Requests and Responses

Structures

Compatibility, Deprecation, and Migration Plan

Rejected Alternatives

Optional Field Buffer Serialization Alternatives

Make all Fields Optional

Space shortcuts

Child pages

The Kafka Protocol should Support Optional Fields

Status

Motivation

Public Interfaces

JSON Schemas

versionsWithOptional

Specifying Optional Fields

Schema Class

Proposed Changes

Serializing Optional Fields

Requests and Responses

Structures

Compatibility, Deprecation, and Migration Plan

Rejected Alternatives

Optional Field Buffer Serialization Alternatives

Make all Fields Optional