You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 27 Next »

IDIEP-75
Author
Sponsor
Created

  

Status
DRAFT


Motivation

Thin clients need a standardized way to serialize data for Table and Key-Value APIs. In 2.x, we used the same Ignite binary format that was used for server communication and data storage. However, in 3.x there are different formats for data storage and transmission (IEP-74 Data Storage), and those formats are not meant to be used by thin clients.

In Ignite 3.0 we want to avoid the cost of writing and supporting our own serialization mechanism.

Non-goals

Thin client protocol (handshake, message format, etc) will be designed in a separate IEP. Here we only discuss a mechanism to serialize user and system data: primitive and compound values, such as cache entries, configuration objects, and so on.

Description

Requirements

The goal is to find an existing serialization format that satisfies the following requirements:

  • Binary (as opposed to text, like JSON, for performance reasons)
  • Supports nested object graphs
  • Supports primitives, not only objects (for example, integer or Guid value can be serialized independently)
  • Supports streaming: multiple values one after another in the same buffer / stream.
  • Schemaless: any object of any type can be written without prior set up
    • For Table APIs (when the schema is present), we can use field IDs instead of names for performance reasons
  • Can work without classes ("binary mode" in terms of 2.x): servers should be able to inspect the structure in serialized form
  • Extensible (can add custom types)
  • Well-supported implementations in all languages of interest (Java, C#, C++, Python, JavaScript, PHP)
    • With compatible license
  • Fast and compact

Comparison

NameCommentsLicense
MessagePack
  • Schemaless binary format.
  • Compatible with JSON (can be directly converted convert to JSON and from JSON: an important use case)
  • The most popular among all. High-performance, well-maintained implementations exist for many languages.
  • Battle tested: used by Redis
Java: Apache 2.0, C#: MIT, C++: MIT (nlohmann/json), Python: Apache 2.0, JavaScript: MIT, PHP: MIT
CBOR
  • Based on MessagePack.
  • Less popular than MessagePack, fewer implementations, outdated PHP implementation.
  • Standardized (RFC7049), but MessagePack is simpler.
  • Included in stdlib in .NET 5.
  • "Use MsgPack instead of CBOR":  https://diziet.dreamwidth.org/6568.html
Java: Apache 2.0, C#: CC0, C++: MIT, Python: MIT, JavaScript: MIT, PHP: PHP License
FlexBuffers
  • "Schemaless cousin of Google's FlatBuffers". Can be accessed without parsing, copying, or allocation.
  • Can't serialize arbitrary objects at this point (in Java and C#)
  • Relatively new, has not gained traction

BSONDesigned for MongoDB storage and in-memory manipupation, not for network usage => more verbose than MessagePack/CBOR
UBJSONSeems to be abandoned, implementations (e.g. C#) are not maintained

Popular formats like Avro, Thrift, ProtoBuf, FlatBuffers and others are not mentioned, because the don't satisfy one or more requirements above (schemaless, etc).

Conclusion

  • MessagePack and CBOR satisfy all requirements (and they are very similar, though not compatible).
  • There seems to be no other contenders.

MessagePack is more widely used and has more mature and well-maintained implementations in all languages of interest.

Discussion Links

// TODO

Reference Links

Tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels