Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We will add several new subclasses of org.apache.kafka.common.protocol.types.Type and org.apache.kafka.common.protocol.types.Field.

Type Class NameField Class NameDescription
CompactArrayOfCompactArrayRepresents an array whose length is expressed as a variable-length integer rather than a fixed 4-byte length.
COMPACT_STRINGCompactStringRepresents a string whose length is expressed as a variable-length integer rather than a fixed 2-byte length.
COMPACT_NULLABLE_STRINGCompactNullableStringRepresents a nullable string whose length is expressed as a variable-length integer rather than a fixed 2-byte length.
COMPACT_BYTESCompactBytesRepresents a byte buffer whose length is expressed as a variable-length integer rather than a fixed 4-byte length.
COMPACT_NULLABLE_BYTESCompactNullableBytesRepresents a nullable byte buffer whose length is expressed as a variable-length integer rather than a fixed 4-byte length.
TagSectionTagSectionRepresents a section containing optional tagged fields.

Tagged Fields and Version Compatibility

...

If the number of tagged fields is greater than zero, the tagged fields follow.  They are serialized in ascending order of their tag.  Each tagged field begins with a tag header, serialized as a variable-length integer.  After the tag header, the field data follows.

The number of tagged fieldsTag Header 1Tag Data 1Tag Header 2Tag Data 2...
UNSIGNED_VARINTUNSIGNED_VARLONG<field 1 type>UNSIGNED_VARLONG<field 2 type>...

Tag Headers

The tag header is a 64-bit integer containing the 32-bit tag and the 32-bit field length of the tagged field.  The bits for these fields are interleaved: the even-indexed bits correspond to the length bits, and the odd-indexed bits correspond to the tag bits.

To give an example, let's say that the length was 4 and the tag was 5.  In binary, these numbers would be 0b100 and 0b101, respectively.  Then the end of the varlong would be:

...T2L2T1L1T0L0
...110010

The reason for interleaving the bits is that in the common case where both numbers are small, we want the varlong to take up as few bytes as possible.

...

A compact array contains a 32-bit unsigned varint, followed by the array elements.

32-bit length (plus one)Element 0Element 1...
VARINT<array element type><array element type>...

If the length field is 0, the array is null.  If the length field is 1, the length is 0.  If the length field is 2, the length is 1, etc.

...

A compact bytes field contains a 32-bit unsigned varint, followed by the bytes.

32-bit length (plus one)Payload
VARINTBytes

If the length field is 0, the bytes field is null.  If the length field is 1, the length is 0.  If the length field is 2, the length is 1, etc.

...

A compact string field contains a 32-bit unsigned varint, followed by the string bytes.

32-bit length (plus one)String
VARINTBytes

If the length field is 0, the string field is null.  If the length field is 1, the length is 0.  If the length field is 2, the length is 1, etc.

...

So for, example, let's say we were trying to serialize 300, which is 0b100101100 in binary.  This would be serialized as the following two-byte sequence:

Continuation bitB6B5B4B3B2B1B0Continuation Bit B13B12B11B10B9B8B7
1010110000000010

Unlike signed varints, unsigned varints do not use "zig-zag encoding."  So they cannot efficiently represent negative numbers.

...

In general, adding a tagged field is always a compatible operation.  However, we do not want to reuse a tag that was previously used for something else.  Changing the type or nullability of an existing optional field is also an incompatible change.

Rejected Alternatives

...

Tagged Field Buffer Serialization Alternatives

  • We could serialize optional fields as a tag and a type, rather than a tag and a length.  However, this would prevent us from adding new types in the future, since the old deserializers would not understand them.
  • We could allow the serialization of arrays of objects.  However, this would require a two-pass serialization rather than a single-pass serialization.  The first pass would have to cache the lengths of all the optional object arrays.  We might support this eventually, but for now, it doesn't seem necessary.  We can add it later in a compatible fashion if we decide to.

Make all Fields

...

Tagged

Rather than supporting both mandatory and optional fields, we could make all fields optional.  For fields which we always expect to use, however, this would take more space when serialized.  There are also situations where it is useful for the recipient to know which features the sender supports, and the mandatory field mechanism handles these situations well.

Use Escape Bytes to Minimize Space Usage

We could use escaping to make the size of a tag buffer zero bytes in some cases.  However, this would greatly complicate encoding and decoding the protocol.  It is better to make variable length fields more efficient in general, to offset the extra space of tagged field buffers.