Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

BinaryObject API should be reworked, as it will not represent actual serialized objects anymore. It should be replaced with something like BinaryRecord or DataRecord representing a record in a cache or table. Similarly to the current binary objects, records will provide access to individual fields. A record can also be deserialized into a class with any subset of fields represented in the record.

Schema Definition API

There are several ways a schema can be defined. The initial entry point to the schema definition is SchemaBuilder java API:

TBD

The schema builder calls are transparently mapped to DDL statements so that all operations possible via a builder are also possible via DDL and vice versa.

Additionally, we may introduce an API that will infer the schema from a key-value pair using class fields and annotations. The inference happens on the calling site of the node invoking the table modification operation.

Data restrictions

Schema-first approach imposes certain natural requirements which are more strict than binary object serialization format:

  • Column The column type must be of one of a predefined set of available 'primitives' (including Strings, UUIDs, date & time values)
  • Arbitrary nested objects and collections are not allowed as column values. Nested POJOs should either be inlined into a schema , or stored as BLOBs
  • Date & time values should be compressed with preserving natural order and decompression should be a trivial operation (like applying bitmask).

...

TypeSizeDescription
Bitmask(n)n/8 bytesA fixed-length bitmask of n bits
Int81 byte1-byte signed integer
Uint81 byte1-byte unsigned integer
Int162 bytes2-byte signed integer
Uint162 bytes2-byte unsigned integer
Int324 bytes4-byte signed integer
Uint324 bytes4-byte unsigned integer
Int648 bytes8-byte signed integer
Uint648 bytes8-byte unsigned integer
Float4 bytes4-byte floating-point number
Double8 bytes8-byte floating-point number
Number([n])VariableVariable-length number (optionally bound by n bytes in size)
DecimalVariableVariable-length floating-point number
UUID16 bytesUUID
StringVariableString A string encoded with a given Charset
Date3 bytesA timezone-free date encoded as a year (15 bits), month (4 bits), day (5 bits)
Time4 bytesA timezone-free time encoded as padding (5 bits), hour (5 bits), minute (6 bits), second (6 bits), millisecond (10 bits)
Datetime7 bytesA timezone-free datetime encoded as (date, time)
Instant8 bytesNumber of milliseconds since Jan 1, 1970 00:00:00.000 (with no timezone)
BLOBVariableVariable-size byte array

...

Given schema evolution history, a tuple migration from version N-k to version N is a straightforward operation. We identify fields that were dropped during the last k schema operations and fields that were added (taking into account default field values) and update the tuple based on the field modifications. Afterward, the updated tuple is written in the schema version N layout format. The tuple upgrade may happen on read with an optional writeback or on next update. Additionally, tuple upgrade in background is possible.

...