Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Transmitting the schema with every message (high overhead)
  • Centrally register schemas and tag each message with an ID. Readers can look up schemas on demand.
  • Manually specifying a fixed schema for each connector (e.g. allow this to be passed via connector config and make the deserializer interface accept a schema string)

This isn't an issue for many applications because they don't work with dynamic schemas. Normally they compile their applications using a specific schema (or auto-generated code for that schema) because they are only handling that one type of data. They can avoid any central registration service because it is assumed both the reader and writer have a copy of the schema (or at least compatible schemas).

...

  • Use one of the existing Avro (or generic) schema registries/manual schema specification. This is already incorporated into some existing serializers (e.g. Confluent's)
  • Converter translates between Copycat types and Avro types (primitive types, GenericRecord, Map, Collection). This implementation is straightforward because Avro has good built-in support for handling schemas dynamically (GenericRecord).

Thrift

  • Requires schema registry/manual schema specification. Thrift is highly dependent on not changing field IDs between different versions of the same schema. This means it may need to be possible to look up previous versions of schemas to ensure compatibility (if reader/writer schemas might not match).
  • Converter can be a nop. Thrift doesn't have an intermediate format that supports dynamic schemas. (Alternatively, one could build a TBase implementation similar to Avro's GenericRecord)
  • Serializer implementation will need to be custom, but can reuse TProtocol implementations. Thrift doesn't have built-in support for parsing schemas, but third-party libraries have implemented this (e.g., this one from Facebook). By combining these, the serializer should be a straightforward implementation.

Protocol Buffers

  • Requires schema registry/manual schema specification. Protobufs is highly dependent on not changing field IDs between different versions of the same schema. This means it may need to be possible to look up previous versions of schemas to ensure compatibility (if reader/writer schemas might not match).
  • Converter translates between copycat types and Descriptor/DynamicMessage.

...