...
Given a set of user-defined columns, this set is then rearranged so that fixed-sized columns go first. This sorted set of columns is used to form a tuplerow. Tuple Row layout is as follows:
Field | Size |
---|---|
Schema version | 2 bytes |
Flags | 2 byte |
Key columns hash | 4 bytes |
Key columnschunk: | |
Key columns full chunk size | 4 bytes |
Key columns varsize columns Variable-length columns offsets table size | 2 bytes |
Key columns varsize Variable-length columns offsets table | Variable (number of non-null non-default varsize varlen columns * 4) |
Key columns nullNull-defaults map | ⌈number of columns / 8⌉ |
Key Fix-sized columns fixed size values | Variable |
Key Variable-length columns variable size values | Variable |
Value columnschunk: | |
Value columns full chunk size | 4 bytes |
Value columns varsize columns Variable-length columns offsets table size | 2 bytes |
Value columns varsize columns Variable-length columns offsets table | Variable (number of non-null non-default varsize varlen columns * 4) |
Value columns nullNull-defaults map | ⌈number of columns / 8⌉ |
Value Fix-sized columns fixed size values | Variable |
Value Variable-length columns variable size values | Variable |
The flags field is a bitmask with each bit treated as a flag, with the following flags available (from flag 0 being the LSB to flag 7 being MSB):
Unlike Ignite 2.x approach, where binary object schema ID is defined by a set of fields that are present in a binary object, for the schema-first approach we assign a monotonically growing identifier to each version of the cache schema. The ordering guarantees should be provided by the underlying metadata storage layer (for example, the current distributed metastorage implementation or consensus-based metadata storage). The schema identifier should be stored together with the data tuples (but not necessarily with each tuple individually: we can store schema ID along with a page or larger chunks of data). The history of schema versions must be stored for a long enough period of time to allow upgrade all existing data stored in a given cache.
Unlike Ignite 2.x approach, where binary object schema ID is defined by a set of fields that are present in a binary object, for the schema-first approach we assign a monotonically growing identifier to each version of the cache schema. The ordering guarantees should be provided by the underlying metadata storage layer (for example, the current distributed metastorage implementation or consensus-based metadata storage). The schema identifier should be stored together with the data rows (but not necessarily with each row individually: we can store schema ID along with a page or larger chunks of data). The history of schema versions must be stored for a long enough period of time to allow upgrade all existing data stored in a given cache.
Given schema evolution history, a row migration from version N-k to version N is a Given schema evolution history, a tuple migration from version N-k to version N is a straightforward operation. We identify fields that were dropped during the last k schema operations and fields that were added (taking into account default field values) and update the tuple row based on the field modifications. Afterward, the updated tuple row is written in the schema version N layout format. The tuple row upgrade may happen on read with an optional writeback or on next update. Additionally, tuple row upgrade in background is possible.
Since the tuple row key hashcode is inlined to the tuple row data for quick key lookups, we require that the set of key columns do not change during the schema evolution. In the future, we may remove this restriction, but this will require careful hashcode calculation adjustments since the hash code value should not change after adding a new column with default value. Removing a column from the key columns does not seem possible since it may produce duplicates, and checking for duplicates may require a full scan.
...
With this history, upgrading a tuple row (1, "John", "Doe")
of version 1 to version 4 means erasing columns lastname
and taxid
and adding columns residence
with default "GB"
and lastname
(the column is returned back) with default "N/A"
resulting in tuple row (1, "John", "GB", "N/A")
.
...
It's clear that given a fixed schema, we can generate an infinite number of classes that match the column of this schema. This observation can be used to simplify ORM for the end-users. For the APIs which return Java objects, the mapping from schema columns to the object fields can be constructed dynamically, allowing to deserialize a single tuple row into instances of different classes.
For example, let's say we have a schema PERSON (id INT, name VARCHAR (32), lastname VARCHAR (32), residence VARCHAR (2), taxid INT)
. Each tuple row of this schema can be deserialized into the following classes:
...
Given the set of fields in the target class, Ignite may optimize the amount of data sent over the network by skipping fields that would be ignored during deserialization.
Update operation with object of truncated class is also possible, but missed fields will be treated as "not-set" as if it is done via SQL INSERT statement with some PERSON table fields missed. Missed field values will be implicitly set to DEFAULT column value regarding the row schema version.
Code Block | ||
---|---|---|
| ||
table.insert(Person); |
It may be impossible to insert an object/row with missed field if field is declared with NOT-NULL constraint and without DEFAULT (non-null) value specified.
Ignite will provide out-of-box mapping from standard platform types (Java, C#, C++) to built-in primitives. A user will be able to alter this mapping using some external mechanism (e.g. annotations to map long values to Number). Standard mapping is listed in the table below:
...