Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

More concretely, benchmarking a MetadataResponse (just the body, excluding the header) containing a single 100 partition topic replicated across two brokers suggests that:

  • Fixed length encoding is 3216 bytes, taking on average 56,458µs to serialize and 14416µs to deserialize.
  • In the best case, variable length encoding is 1170 bytes, taking on average 65226µs to serialize and 14713µs to deserialize.
  • In the worse case, variable length encoding is 4026 bytes, taking on average 81328µs to serialize and 14755µs to deserialize.
EncodingSize/byteStruct Serialize/µsStruct Deserialize/µsBuffer Serialize/µsBuffer Deserialize/µs
fixed321656,45814,4167,21511,508
variable (best case)117065,22614,7137,66710,155
variable (worst case)402681,32814,75521,40017,681

The worst The worst case would occur if the cluster had brokers with ids greater than 134,217,727, and for topics with more than that many partitions and where the error code was >255.

...

This KIP proposes a mechanism for allowing RPCs (including new versions of existing RPCs) to use varints.
It does not propose any changes to existing RPC messages to make use of the new encoding.
It is envisaged that RPCs will make use of this functionality as those RPCs get changed under other KIPs and guided by benchmarking about the costs and benefits.

Public Interfaces

This could be done in two ways, either by making the existing type property of fields support version-dependent types, or by introducing a separate encoding property.

Making FieldSpec's type version-dependent

The existing type property of fields would be allowed to be either the JSON String type or the JSON object type.
The interpretation of a String-typed property would be that the property has the named type in all versions of the property.
When type was an object the keys would be version ranges and the values would be the type of the property in messages within that range.
Support would be added for new types: varint16, varint32varint64 and unsigned_varint16 etc.
The Java type of the property corresponding to the field spec would be the widest corresponding Java type.
This would to allow, in addition to variable length encodings, for the possibility for 32-bit fields to evolve to 64-bit quantities between message versions.

Example

Focussing specifically on the LeaderId of the MetadataResponsePartition previously described:

Code Block
languagejs
linenumberstrue
{ "name": "LeaderId",
  "type": { 
    "0-9": "int32",
    "10+": "unsigned_varint32" 
  },
  "versions": "0+",
  "entityType": "brokerId",
  "about": "The ID of the leader broker."
}

Alternative: Adding a separate encoding property

Field specs in the protocol message JSON format will get support for a new encoding property, which will define, for each version of the field, how the value should be encoded. This approach makes encoding a first-class concept, separating the logical type of a field from how it is encoded on the wire.

The value of the encoding property will be either a JSON object or a JSON string:

  • When it is an object each key defines a version range and the corresponding value is named encoding used for the field for those versions.
  • When it is a string the value is the named encoding to be used for all versions defined in the FieldSpec's versions property.

It will be a generation-time error if:

  • encoding is present on a field spec with type other than int16int32 or int64.
  • the union of the versions defined by encoding do not exactly equals the versions of the field.
  • any pair of version ranges defined by encoding have a nonempty intersection.

The names of the supported encodings match the regular expression (fixed|packed|upacked)(16|32|64). "upacked" is short for "unsigned packed". For example:

  • fixed32 is the fixed-size encoding of a 32 bit integer
  • packed32  is variable signed encoding of a 32 bit integer

Any other value for an encoding name will be a generation-time error.

Info

Including the number of bits in the name of the encoding (when it's already present in the fields type) provides a path to evolving field schemas from 32 to 64 bits. Specifically, a field might originally have been defined

Code Block
{ "name": "tooSmall", "type": "int32", ... }

This could be changed (e.g. in version 2 of the message) to:

Code Block
{ "name": "tooSmall", "type": "int64",
  "encoding": { "0-1": "fixed32", "2+", "2+": "fixed64"} }

This change would result in type of the tooSmall property in the Java representation changing from int  to long (a one-time refactoring). But protocol compatibility would be maintained because:

  • In versions 0 and 1 an int (using fixed encoding) would be read from the buffer, and promoted to a long. When writing, the `long` the value would be range checked prior to downcasting to a int and writing using the fixed encoding.
  • In versions 2 and above a long (using fixed encoding) would be read from the buffer. When writing the long value would be written using the fixed encoding.

Using this mechanism:

  • fields can evolve between fixed and variable length encodings without any refactoring of the Java code, requiring only a RPC version change.
  • fields can evolve from fewer to more numbers of bits between versions requiring only a one-off refactoring. Contrast this to having two distinct fields of different types (and thus different names) existing in different versions of a message.

The default when no encoding is present on a field is to use the fixed encoding of the appropriate typeField specs in the protocol message JSON format will get support for a new encoding property, which will define, for each version of the field, how the value should be encoded. This approach makes encoding a first-class concept, separating the logical type of a field from how it is encoded on the wire. While it is more verbose it is potentially more flexible than conflating type and encoding within the `type` property, since it would be easy to add further named encodings in the future.

Example

Code Block
languagejs
linenumberstrue
{ "name": "LeaderId",
  "type": "int32",
  "versions": "0+",
  "entityType": "brokerId",
  "about": "The ID of the leader broker.",
  "encoding": {
    "0-9": "fixed32",
    "10+": "unsigned32"
}}

Proposed Changes

TBC based on selection of the preferred alternative via discussionThe message generator will be modified to encode values using the encoding defined for the message's version and relevant type.

Compatibility, Deprecation, and Migration Plan

...