Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state: Under Discussion

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

The Kafka protocol already supports variable length encodings for integers. Specifically, KIP-482 added support for using an unsigned variable length integer encoding for the length of variable length data (strings, arrays, bytes) and for integer quantities in tagged fields. However it is currently not possible to use a variable length encoding for regular fields with an integral (short, integer or long) type.

...

Using variable length encodings for these quantities in Kafka protocol messages would make those message smaller. For some RPCs the messages could be substantially smaller.

An example: MetadataResponse

Taking MetadataResponseData as an example, and looking just at the deeply nested MetadataResponsePartition the current schema is:

...

Since most of the data in a typical MetadataResponse is partition data, such a change would make typical responses substantially smaller.

Scope

This KIP proposes a mechanism for allowing RPCs (including new versions of existing RPCs) to use varints.
It does not propose any changes to existing RPC messages to make use of the new encoding.
It is envisaged that RPCs will make use of this functionality as those RPCs get changed under other KIPs and guided by benchmarking about the costs and benefits.

Public Interfaces

This could be done in two ways, either by making the existing type property of fields support version-dependent types, or by introducing a separate encoding property.

Making FieldSpec's type version-dependent

The existing type property of fields would be allowed to be either the JSON String type or the JSON object type.
The interpretation of a String-typed property would be that the property has the named type in all versions of the property.
When type was an object the keys would be version ranges and the values would be the type of the property in messages within that range.
Support would be added for new types: varint16, varint32varint64 and unsigned_varint16 etc.
The Java type of the property corresponding to the field spec would be the widest corresponding Java type.
This would to allow, in addition to variable length encodings, for the possibility for 32-bit fields to evolve to 64-bit quantities between message versions.

Example

Focussing specifically on the LeaderId of the MetadataResponsePartition previously described:

Code Block
languagejs
linenumberstrue
{ "name": "LeaderId",
  "type": { 
    "0-9": "int32",
    "10+": "unsigned_varint32" 
  },
  "versions": "0+",
  "entityType": "brokerId",
  "about": "The ID of the leader broker."
}

Alternative: Adding a separate encoding property

Field specs in the protocol message JSON format will get support for a new encoding property, which will define, for each version of the field, how the value should be encoded. This approach makes encoding a first-class concept, separating the logical type of a field from how it is encoded on the wire. While it is more verbose it is potentially more flexible than conflating type and encoding within the `type` property, since it would be easy to add further named encodings in the future.

Example

Code Block
languagejs
linenumberstrue
{ "name": "LeaderId",
  "type": "int32",
  "versions": "0+",
  "entityType": "brokerId",
  "about": "The ID of the leader broker.",
  "encoding": {
    "0-9": "fixed32",
    "10+": "unsigned32"
}}

Proposed Changes

TBC based on selection of the preferred alternative via discussion.

Compatibility, Deprecation, and Migration Plan

The proposal is backwards compatible: Clients using existing API versions will continue to use fixed-size encoding.

New versions of existing RPC messages will be able to use variable length encoding on a per-field basis.

Rejected Alternatives

  • Simply adding support for varint32 types would in its own allow these types to be used for existing fields.

...