...
BinaryObject
API should be reworked, as it will not represent actual serialized objects anymore. It should be replaced with something like BinaryRecord
or DataRecord
representing a record in a cache or table. Similarly to the current binary objects, records will provide access to individual fields. A record can also be deserialized into a class with any subset of fields represented in the record.
There are several ways a schema can be defined. The initial entry point to the schema definition is SchemaBuilder
java API:
TBD
The schema builder calls are transparently mapped to DDL statements so that all operations possible via a builder are also possible via DDL and vice versa.
Additionally, we may introduce an API that will infer the schema from a key-value pair using class fields and annotations. The inference happens on the calling site of the node invoking the table modification operation.
Schema-first approach imposes certain natural requirements which are more strict than binary object serialization format:
...
Type | Size | Description |
---|---|---|
Bitmask(n) | ⌈n/8⌉ bytes | A fixed-length bitmask of n bits |
Int8 | 1 byte | 1-byte signed integer |
Uint8 | 1 byte | 1-byte unsigned integer |
Int16 | 2 bytes | 2-byte signed integer |
Uint16 | 2 bytes | 2-byte unsigned integer |
Int32 | 4 bytes | 4-byte signed integer |
Uint32 | 4 bytes | 4-byte unsigned integer |
Int64 | 8 bytes | 8-byte signed integer |
Uint64 | 8 bytes | 8-byte unsigned integer |
Float | 4 bytes | 4-byte floating-point number |
Double | 8 bytes | 8-byte floating-point number |
Number([n]) | Variable | Variable-length number (optionally bound by n bytes in size) |
Decimal | Variable | Variable-length floating-point number |
UUID | 16 bytes | UUID |
String | Variable | String A string encoded with a given Charset |
Date | 3 bytes | A timezone-free date encoded as a year (15 bits), month (4 bits), day (5 bits) |
Time | 4 bytes | A timezone-free time encoded as padding (5 bits), hour (5 bits), minute (6 bits), second (6 bits), millisecond (10 bits) |
Datetime | 7 bytes | A timezone-free datetime encoded as (date, time) |
Instant | 8 bytes | Number of milliseconds since Jan 1, 1970 00:00:00.000 (with no timezone) |
BLOB | Variable | Variable-size byte array |
...
Given schema evolution history, a tuple migration from version N-k to version N is a straightforward operation. We identify fields that were dropped during the last k schema operations and fields that were added (taking into account default field values) and update the tuple based on the field modifications. Afterward, the updated tuple is written in the schema version N layout format. The tuple upgrade may happen on read with an optional writeback or on next update. Additionally, tuple upgrade in background is possible.
...