...
ID | IEP-75 |
Author | |
Sponsor | |
Created | |
Status | |
Motivation
Thin clients need a standardized way to serialize data.
...
In Ignite 3.0 we want to avoid the cost of writing and supporting our own serialization mechanism.
Non-goals
Thin client protocol (handshake, message format, etc) will be designed in a separate IEP. Here we only discuss a mechanism to serialize user and system data: primitive and compound values, such as cache entries, configuration objects, and so on.
Description
Use MsgPack format in the Ignite 3.0 thin client protocol.
MsgPack Example
Code Block |
---|
language | java |
---|
title | SQL request (query text + arguments array) |
---|
|
packer
.packString("select * from cars where year > ? and seats = ?")
.packArrayHeader(2)
.packInt(2005)
.packInt(2); |
...
For comparison, current Ignite binary protocol encodes the same data in 71 bytes, and takes 2x time to do so (see benchmarks below - writeSqlQueryIgnite and writeSqlQueryMsgPack).
Why MsgPack
The goal is to find an existing serialization format that satisfies the following requirements:
- Binary (as opposed to text, like JSON, for performance reasons)
- Supports nested object graphs
- Supports primitives, not only objects (for example, integer or Guid value can be serialized independently)
- Supports streaming: multiple values one after another in the same buffer / stream.
- Schemaless: any object of any type can be written without prior set up
- For Table APIs (when the schema is present), we can use field IDs instead of names for performance reasons
- Can work without classes ("binary mode" in terms of 2.x): servers should be able to inspect the structure in serialized form
- Extensible (can add custom types)
- Well-supported implementations in all languages of interest (Java, C#, C++, Python, JavaScript, PHP)
- Fast and compact
Comparison
Name | Comments | License |
---|
MessagePack | - Schemaless binary format.
- Compatible with JSON (can be directly converted convert to JSON and from JSON: an important use case)
- The most popular among all. High-performance, well-maintained implementations exist for many languages.
- Battle tested: used by Tarantool and Redis
| Java: Apache 2.0, C#: MIT, C++: MIT (nlohmann/json), Python: Apache 2.0, JavaScript: MIT, PHP: MIT |
CBOR | - Based on MessagePack.
- Less popular than MessagePack, fewer implementations, outdated PHP implementation.
- Standardized (RFC7049), but MessagePack is simpler.
- Included in stdlib in .NET 5.
- "Use MsgPack instead of CBOR": https://diziet.dreamwidth.org/6568.html
| Java: Apache 2.0, C#: CC0, C++: MIT, Python: MIT, JavaScript: MIT, PHP: PHP License |
FlexBuffers | - "Schemaless cousin of Google's FlatBuffers". Can be accessed without parsing, copying, or allocation.
- Can't serialize arbitrary objects at this point (in Java and C#)
- Relatively new, has not gained traction
|
|
BSON | Designed for MongoDB storage and in-memory manipupation, not for network usage => more verbose than MessagePack/CBOR |
|
UBJSON | Seems to be abandoned, implementations (e.g. C#) are not maintained |
|
Popular formats like Avro, Thrift, ProtoBuf, FlatBuffers and others are not mentioned, because the don't satisfy one or more requirements above (schemaless, etc).
Conclusion
- MessagePack and CBOR satisfy all requirements (and they are very similar, though not compatible).
- There seems to be no other contenders.
MessagePack is more widely used and has more mature and well-maintained implementations in all languages of interest.
Benchmarks
- Code is linked below
- MsgPack is always faster on primitive values
- MsgPack is more compact because of varints everywhere
- Ignite is faster on POJOs, because MsgPack uses Jackson integration to handle objects, which is very configurable and nice, but comes at a cost.
- We can develop our own implementation if needed.
- In C# benchmarks (not included here) MsgPack is 4x faster than Ignite on a similar model class, which proves that the implementation can be more efficient (see also .NET Serialization Benchmark 2019 Roundup)
No Format |
---|
* Benchmark Mode Cnt Score Error Units
* JmhBinaryMarshallerMsgPackBenchmark.writePrimitivesMsgPackRaw thrpt 10 16834154.556 ± 85624.143 ops/s
* JmhBinaryMarshallerMsgPackBenchmark.writePrimitivesIgnite thrpt 10 12702562.838 ± 248094.068 ops/s
*
* JmhBinaryMarshallerMsgPackBenchmark.writePojoIgnite thrpt 10 11590924.790 ± 42061.734 ops/s // Full footers
* JmhBinaryMarshallerMsgPackBenchmark.writePojoMsgPack thrpt 10 5386377.535 ± 33835.097 ops/s // Fields with names
* JmhBinaryMarshallerMsgPackBenchmark.writePojoMsgPack2 thrpt 10 8505961.494 ± 465369.449 ops/s // Fields without names
*
* JmhBinaryMarshallerMsgPackBenchmark.readPrimitivesIgnite thrpt 10 19873521.096 ± 545779.558 ops/s
* JmhBinaryMarshallerMsgPackBenchmark.readPrimitivesMsgPack thrpt 10 29235107.372 ± 85371.004 ops/s
*
* JmhBinaryMarshallerMsgPackBenchmark.readPojoIgnite thrpt 10 8437054.066 ± 104476.415 ops/s
* JmhBinaryMarshallerMsgPackBenchmark.readPojoMsgPack thrpt 10 6292876.474 ± 73356.915 ops/s
*
* JmhBinaryMarshallerMsgPackBenchmark.writeSqlQueryIgnite thrpt 10 5756908.336 ± 42079.083 ops/s
* JmhBinaryMarshallerMsgPackBenchmark.writeSqlQueryMsgPack thrpt 10 12380076.956 ± 150712.634 ops/s
(Ubuntu 20.04, OpenJDK 1.8.0_292, i7-9700K) |
Risks and Assumptions
- There is no true random access to fields by name in MsgPack - offsets are not stored, values are written sequentially. Though it is possible to skip values without reading them.
- Some types, like UUID and date/time, will require custom handling (e.g. UUID is written as string by default, which is not optimal). MsgPack allows up to 128 custom types to be defined.
- To be able to read user objects separately and efficiently without deserializing them (e.g. key and value in put operation), we'll have to wrap them one of the following ways:
- As a byte array (MsgPack bin format) - "MsgPack within MsgPack".
- Custom MsgPack type with size in the header
Discussion Links
...
Reference Links
Tickets
Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
maximumIssues | 20 |
---|
jqlQuery | labels=iep-75 |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
|