Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state: Under DiscussionAccepted

Discussion thread: here

JIRA: 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-8885

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

It would be tedious to update the JSON message specifications to add tagged fields to each structure.  Similarly, we don't wan want to manually annotate each string, buffer, or array that should now be serialized in a more efficient way.  Instead, we should simply have the concept of "flexible versions." Any version of a message that is a "flexible version" has the changes described above.

Public Interfaces

JSON Schemas

flexibleVersions

The flexible versions will be described by a new top-level field in each request and response.  The format will be the same as that of existing version fields.  If the flexible versions are not specified, it is assumed that all versions are flexible.

Note that adding support for tagged versions to an RPC requires bumping the protocol version number.

Specifying Tagged Fields

Tagged fields can appear at the top level of a message, or inside any structure.

Each optional field has a 31-bit tag number. This number must be unique within the context it appears in.  For example, we could use the tag number "1" both at the top level and within a particular substructure without creating ambiguity, since the contexts are separate.

Optional fields can have any type, except that they cannot be arrays.  Note that the restriction against having tagged arrays is just to simplify serialization.  We can relax this restriction in the future without changing the protocol on the wire.

Here is an example of a message spec which has tagged fields at both the top level and the array level:

In order to have flexible version support across all requests and responses, we will bump the version of all requests and responses.  The new versions will be flexible.  (This version bump may be implemented earlier for some message types than others, depending on implementation considerations.)

RequestHeader Version 1

Requests within a "flexible version" will have a new version of the request header.  The new RequestHeader version will be version 1, superseding version 0.  In this new version, the RequestHeader's ClientId string will be a COMPACT_STRING rather than  STRING.  Additionally, the header will contain space for tagged fields at the end.  Supporting tagged fields in the request header will give us a natural place to put additional information that is common to all requests.

ResponseHeader Version 1

Responses within a "flexible version" will have a new version of the response header.  The new ResponseHeader version will be version 1, superseding version 0.  In this new version, the header will contain space for tagged fields at the end. Supporting tagged fields in the response header will give us a natural place to put additional information that is common to all responses.

Public Interfaces

JSON Schemas

flexibleVersions

The flexible versions will be described by a new top-level field in each request and response.  The format will be the same as that of existing version fields.  The flexible versions must be specified in each JSON file.

Note that adding support for tagged versions to an RPC requires bumping the protocol version number.

Specifying Tagged Fields

Tagged fields can appear at the top level of a message, or inside any structure.

Each optional field has a 31-bit tag number. This number must be unique within the context it appears in.  For example, we could use the tag number "1" both at the top level and within a particular substructure without creating ambiguity, since the contexts are separate.  Tagged fields can have any type.

Here is an example of a message spec which has tagged fields at both the top level and the array level:

Code Block
languagejs
{
  "apiKey": 9000,
  "type": "response",
  "name": "
Code Block
languagejs
{
  "apiKey": 9000,
  "type": "response",
  "name": "FooResponse",
  "validVersions": "0-9",
  "flexibleVersions": "9+",
  "optionalFieldsfields": [
    {
      { "name": "UserAgent", "type": "string", "tag": 0, "taggedVersions": "9+",
        "about": "The user-agent that sent this request." },
  ],
  "fields": [
    { "name": "Foos", "type": "[]Foo", "versions": "0+",
      "about": "Each foo.", "optionalFieldsfields": [
        { "name": "Bar", "type": "string", "tag": 0,
  "taggedVersions": "9+",
          "default": "hello world", "about": "The bar associated with this foo, if any." },
      ], "fields": [
        { "name": "Baz", "type": "int16", "versions": "0+",
          "about": "The baz associated with this foo." },
  ...
  ]
}

Type Classes

We will add several new subclasses of org.apache.kafka.common.protocol.types.Type and org.apache.kafka.common.protocol.types.Field.

Type Class NameField Class Type Class NameField Class NameDescription
CompactArrayOfCompactArrayRepresents an array whose length is expressed as a variable-length integer rather than a fixed 4-byte length.
COMPACT_STRINGCompactStringRepresents a string whose length is expressed as a variable-length integer rather than a fixed 2-byte length.
COMPACT_NULLABLE_STRINGCompactNullableStringRepresents a nullable string whose length is expressed as a variable-length integer rather than a fixed 2-byte length.
COMPACT_BYTESCompactBytesRepresents a byte buffer whose length is expressed as a variable-length integer rather than a fixed 4-byte length.COMPACT_NULLABLE_BYTESCompactNullableBytesRepresents a nullable byte buffer whose length is expressed as a variable-length integer rather than a fixed 4-byte length.
TagSectionTagSectionRepresents a section containing optional tagged fields.

Tagged Fields and Version Compatibility

A tagged field can be retroactively added to an existing message version without breaking compatibility-- provided, of course, that the version in question was a "flexible version."  We cannot add any tagged fields to a inflexible version, and we cannot retroactively change an inflexible version to a flexible one.

Tag numbers must never be reused, nor can we alter the format of a tagged field.  This includes changing a nullable field to a non-nullable one, or vice versa.  When you create the tagged field, you must decide if it will be nullable or not, and stick with that decision forever.

A field can be specified as tagged in some versions and non-tagged in others.  The main use-case for this is to gracefully migrate fields which were previously mandatory to tagged fields, where appropriate.

For convenience, if a field is specified as having a tag, we will assume by default that the tag can appear in all flexible versions.  Therefore, it isn't usually required to specify "versions" or "taggedVersions."  If "taggedVersions" does appear, then it must be a subset of "versions," which must also be specified.

Serialization

Tag Sections

In a flexible version, each structure begins with a tag section.  This section stores all of the tagged fields in the structure.

The tag section begins with a number of tagged fields, serialized as a variable-length integer.  If this number is 0, there are no tagged fields present.  In that case, the tag section takes up only one byte.

If the number of tagged fields is greater than zero, the tagged fields follow.  They are serialized in ascending order of their tag.  Each tagged field begins with a tag header, serialized as a variable-length integer.  After the tag header, the field data follows.

...

Tag Headers

The tag header is a 64-bit integer containing the 32-bit tag and the 32-bit field length of the tagged field.  The bits for these fields are interleaved: the even-indexed bits correspond to the length bits, and the odd-indexed bits correspond to the tag bits.

To give an example, let's say that the length was 4 and the tag was 5.  In binary, these numbers would be 0b100 and 0b101, respectively.  Then the end of the varlong would be:

...

-byte length.
COMPACT_NULLABLE_BYTESCompactNullableBytesRepresents a nullable byte buffer whose length is expressed as a variable-length integer rather than a fixed 4-byte length.
TaggedFieldsTaggedFieldsSectionRepresents a section containing optional tagged fields.

Tagged Fields and Version Compatibility

A tagged field can be retroactively added to an existing message version without breaking compatibility-- provided, of course, that the version in question was a "flexible version."  We cannot add any tagged fields to a inflexible version, and we cannot retroactively change an inflexible version to a flexible one.

Tag numbers must never be reused, nor can we alter the format of a tagged field.  This includes changing a nullable field to a non-nullable one, or vice versa.  When you create the tagged field, you must decide if it will be nullable or not, and stick with that decision forever.

A field can be specified as tagged in some versions and non-tagged in others.  The main use-case for this is to gracefully migrate fields which were previously mandatory to tagged fields, where appropriate.

For convenience, if a field is specified as having a tag, we will assume by default that the tag can appear in all flexible versions.  Therefore, it isn't usually required to specify "versions" or "taggedVersions."  If "taggedVersions" does appear, then it must be a subset of "versions," which must also be specified.

Serialization

Tag Sections

In a flexible version, each structure ends with a tag section.  This section stores all of the tagged fields in the structure.

The tag section begins with a number of tagged fields, serialized as a variable-length integer.  If this number is 0, there are no tagged fields present.  In that case, the tag section takes up only one byte.

If the number of tagged fields is greater than zero, the tagged fields follow.  They are serialized in ascending order of their tag.  Each tagged field begins with a tag header.

Tag Headers

The tag header contains two unsigned variable-length integers.  The first one contains the field's tag.  The second one is the length of the field.

The number of tagged fieldsField 1 TagField 1 LengthFIeld 1 DataField 2 TagField 2 LengthTag Data 2...
UNSIGNED_VARINTUNSIGNED_VARINTUNSIGNED_VARINT<field 1 type>UNSIGNED_VARINTUNSIGNED_VARINT<field 2 type>...

Compact Arrays

A compact array contains a 32-bit unsigned varint, followed by the array elements.

32-bit length (plus one)Element 0Element 1...
UNSIGNED_VARINT<array element type><array element type>...

...

32-bit length (plus one)Payload
UNSIGNED_VARINTBytes

If the length field is 0, the bytes field is null.  If the length field is 1, the length is 0.  If the length field is 2, the length is 1, etc.

...

32-bit length (plus one)String
UNSIGNED_VARINTBytes

If the length field is 0, the string field is null.  If the length field is 1, the length is 0.  If the length field is 2, the length is 1, etc.

...

Unlike signed varints, unsigned varints do not use "zig-zag encoding."  So they cannot efficiently represent negative numbers.

Unsigned Varlongs

The UNSIGNED_VARLONG type is exactly like the UNSIGNED_VARINT type, but it can hold 64 bits instead of just 32  However, an unsigned varint can represent positive numbers in the same or fewer bits than a signed varint.

Requests and Responses

All requests and responses will begin end with a tagged field buffer.  If there are no tagged fields, this will only be a single zero byte.

...