You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Status

Current state: Under Discussion

Discussion thread: tbd

JIRAKAFKA-4932

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Many users use UUID (Universally unique identifier) as key for their Kafka messages. However since Kafka has no built-in UUID Serializer / Deserializer, UUIDs cannot be used out of the box and they need to be converted either to String or to byte[]. Having UUID Serializer / Deserializer built in would make this significantly easier.

Proposed Change

This KIP proposes to add new UUIDSerializer and UUIDDeserializer classes as well as support for the new classes into the Serdes class. This will allow using UUID directly from Consumers, Producers and Streams.

UUID serialization and deserialization will be done through String into 36 bytes array (and the other way around for deserialization). The String representation of UUID is common across platforms and programming languages. (See Rejected Alternatives as to why the String representation is used and not the Binary representation)

Public Interfaces

  • New class org.apache.kafka.common.serialization.UUIDSerializer which implements the Serializer<UUID> interface
  • New class org.apache.kafka.common.serialization.UUIDDeserializer which implements Deserializer<UUID> interface
  • New method static public Serde<UUID> UUID()in org.apache.kafka.common.serialization.Serdes class
  • New subclass UUIDSerde in org.apache.kafka.common.serialization.Serdes which creates new serde based on UUIDSerializer and UUIDDeserializer classes

Migration Plan and Compatibility

This KIP is a new implementation and doesn't have any backwards compatibility issues or special requirements on migration from older versions.

Rejected Alternatives

UUID can be also represented in binary format as 16 bytes array. However, different platforms / programming languages have different interpretations of the binary format (little endian, big endian etc.). Because of this complications I suggest to support only the String representation which has a chance to keep a good compatibility between different clients without complicated configuration.

  • No labels