Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

BinaryObject API should be reworked, as it will not represent actual serialized objects anymore. It should be replaced with something like BinaryRecord or DataRecord representing a record in a cache or table. Similarly Similar to the current binary objects, records will provide access to individual fields. A record can also be deserialized into a class with any subset of fields represented in the record.

Schema Definition API

There are several ways a schema can be defined. The initial entry point to the schema definition is SchemaBuilder java API:

TBD

The schema builder calls are transparently mapped to DDL statements so that all operations possible via a builder are also possible via DDL and vice versa.

Additionally, we may introduce an API that will infer the schema from a key-value pair using class fields and annotations. The inference happens on the calling site of the node invoking the table modification operation.

Table schema should be automatically exposed to the tabel configuration subtree so that simple schema changes are available via ignite CLI and the schema can be defined during the table creation via ignite CLI.

Data restrictions

Schema-first approach imposes certain natural requirements which are more strict than binary object serialization format:

  • The column type must be of one of a predefined set of available 'primitives' (including Strings, UUIDs, date & time values)
  • Arbitrary nested objects and collections are not allowed as column values. Nested POJOs should either be inlined into a schema or stored as BLOBs
  • Date & time values should be compressed with preserving natural order and decompression should be a trivial operation (like applying bitmask).

The suggested list of supported built-in data types is listed in the table below:

Introducing versioned schema allows to upgrade rows to the latest version on-fly and even to update a schema automatically in some simple cases, e.g. adding a new column. 
So, a user may choose between two modes: Strict and Live for manual schema management and dynamic schema expansion correspondingly.

Schema Definition API

There are several ways a schema can be defined. The initial entry point to the schema definition is SchemaBuilder java API:

TBD (see SchemaBuilders class for details)

The schema builder calls are transparently mapped to DDL statements so that all operations possible via a builder are also possible via DDL and vice versa.

Additionally, we may introduce an API that will infer the schema from a key-value pair using class fields and annotations. The inference happens on the calling site of the node invoking the table modification operation.

Table schema should be automatically exposed to the table configuration subtree so that simple schema changes are available via ignite CLI and the schema can be defined during the table creation via ignite CLI.

Data restrictions

The Schema-first approach imposes certain natural requirements which are more strict than binary object serialization format:

  • The column type must be of one of a predefined set of available 'primitives' (including Strings, UUIDs, date & time values)
  • Arbitrary nested objects and collections are not allowed as column values. Nested POJOs should either be inlined into a schema or stored as BLOBs
  • Date & time values should be compressed with preserving natural order and decompression should be a trivial operation (like applying bitmask).

The suggested list of supported built-in data types is listed in the table below:

TypeSizeDescription
Bitmask(n)n/8 bytesA fixed-length bitmask of n bits
Int8
TypeSizeDescription
Bitmask(n)n/8 bytesA fixed-length bitmask of n bits
Int81 byte1-byte signed integer
Uint81 byte1-byte unsigned integer
Int162 bytes2-byte signed integer
Uint162 bytes2-byte unsigned integer
Int324 bytes4-byte signed integer
Uint324 bytes4-byte unsigned integer
Int648 bytes8-byte signed integer
Uint648 bytes8-byte unsigned integer
Float4 bytes4-byte floating-point number
Double8 bytes8-byte floating-point number
Number([n])VariableVariable-length number (optionally bound by n bytes in size)
DecimalVariableVariable-length floating-point number
UUID16 bytesUUID
StringVariableA string encoded with a given Charset
Date3 bytesA timezone-free date encoded as a year (15 bits), month (4 bits), day (5 bits)
Time4 bytesA timezone-free time encoded as padding (5 bits), hour (5 bits), minute (6 bits), second (6 bits), millisecond (10 bits)
Datetime7 bytesA timezone-free datetime encoded as (date, time)
Timestamp8 bytesNumber of milliseconds since Jan 1, 1970 00:00:00.000 (with no timezone)
BinaryVariableVariable-size byte array

...

Variable
FieldSize
Schema version2 bytes
Flags2 byte
Key columns hash4 bytes
Key chunk:
Key chunk size
4 bytes
Null-mapnumber of columns / 8
Variable-length columns offsets table size2 bytes
Variable-length columns offsets tableVariable (number of non-null varlen columns * 4)
Fix-sized columns valuesVariable
Variable-length columns valuesVariable
Value chunk:
Value chunk size4 bytes
Null-mapnumber of columns / 8
Variable-length  columns offsets table size2 bytes
Variable-length  columns offsets tableVariable (number of non-null varlen columns * 4)
Fix-sized columns valuesVariableVariable-length columns valuesVariable
Variable-length columns valuesVariable

For the small rows, the metadata sizes may introduce a very noticeable overhead, so it looks reasonable to write them in a more compact way using different techniques.

  • VarInt - variable size integer for sizes
  • different VarTable formats with byte/short/int offsets
  • skip writing VarTable and/or Null-map if possible.

The flags field can be used to detect the format. 

IMPORTANT: having multiple formats MUST guarantee the key (as well as value) chunk will be always written in a single possible way to allow comparing chunks of rows of the same version as just byte arrays.

The flags field is a bitmask with each bit treated as a flag, with the following flags available (from flag 0 being the LSB to flag 7 being MSB):

  • Flag 0: no value. If the flag is set, the value chunk is omitted, e.g. the row represents a tombstone
  • Flag 1: skip key nullmapNull-map. If the flag is set, all column values in the key chunk are non-null, so that the null the Null-map for the key chunk is omitted
  • Flag 2: skip value nullmapNull-map. If the flag is set, all column values in the value chunk are non-null, so that the null the Null-map for the value chunk is omitted
  • Flag 3: skip key varlen tableVarTable. If flag is set, all column values in the key chunk either of fix-sized type or null, so that the varlen table VarTable for key chunk is omitted.
  • Flag 4: skip value varlen value VarTable table. If flag is set, all column values in the value chunk either of fix-sized type or null, so that the varlen table VarTable for value chunk is omitted.
  • Flags 5-15: Reserved for future use.
Hash calculation and key comparison

Row hash can be calculated from affinity field values while marshalling marshaling to byte array. Because od field order is defined by the scheme, a key hash can be calculated consistently regarding the column order.

Key can be compared as byte[] for compatible schemas (that has have the same key column set), otherwise otherwise, the oldest row should be upgrade upgraded first.
It is possible to compare keys column-by-column regarding the schema if the same key can be serialized in more than one way. E.g. kind of compression will be supported and compressed rows could be marked with a flag.

...

Given schema evolution history, a row migration from version N-k to version N is a straightforward operation. We identify fields that were dropped during the last k schema operations and fields that were added (taking into account default field values) and update the row based on the field modifications. Afterward, the updated row is written in the schema version N layout format. The row upgrade may happen on read reading with an optional writeback or on the next update. Additionally, a row upgrade in the background is possible.

Since the row key hashcode is inlined to the row data for quick key lookups, we require that the set of key columns do not change during the schema evolution. In the future, we may remove this restriction, but this will require careful hashcode calculation adjustments since the hash code value should not change after adding a new column with default value. Removing a column from the key columns does not seem possible since it may produce duplicates, and checking for duplicates may require a full scan.

...

If one will try to serialize object with 'short' value out of Uint8 range then it end up with exception (ColumnValueIsOutOfRangeException).

Dynamic schema expansion (

...

Live-schema)

One of the important benefits of binary objects was the ability to store objects with different sets of fields in a single cache. We can accommodate for a very similar behavior in the schema-first approach.

When an object a tuple is inserted into a table, we attempt to 'fit' object tuple fields to the schema columns. If a Java object the tuple has some extra fields which are not present in the current schema, the schema is automatically updated to store additional extra fields that are present in the tuple.
This will work in the same way any Java objects that are first-citizens: e.g. Java object or objects in terms of other languages which has implementaion.

On the other hand, if an object has fewer fields than the current schema, the schema is not updated auto(such scenario usually means that an update is executed from an outdated client which did not yet receive a proper object class version). In other words, columns are never dropped during automatic schema evolution; a column can only be dropped by an explicit user command.

...