Page History

...

Given a set of user-defined columns, this set is then rearranged so that fixed-sized columns go first. This sorted set of columns is used to form a tuple. Tuple layout is as follows:

Field	Size
Schema version	2 bytes
Flags	1 byte
Key columns hash	4 bytes
Key columns:
Key columns full size	2 (3?) bytes
Key columns varsize columns offsets table size	2 bytes
Key columns varsize columns offsets table	Variable (number of non-null non-default varsize columns * 2(3?))
Key columns null-defaults map	⌈number of columns / 8⌉
Key columns fixed size values	Variable
Key columns variable size values	Variable
Value columns:
Value columns full size	2 (3?) bytes
Value columns varsize columns offsets table size	2 bytes
Value columns varsize columns offsets table	Variable (number of non-null non-default varsize columns * 2(3?))
Value columns null-defaults map	⌈number of columns / 8⌉
Value columns fixed size values	Variable
Value columns variable size values	Variable

The flags field is a bitmask with each bit treated as a flag, with the following flags available (from flag 0 being the LSB to flag 7 being MSB):

Flag 0: tombstone. If the flag is set, the value chunk is omitted, and the tuple represents a tombstone
Flag 1: skip key nullmap. If the flag is set, all values in the key chunk are non-null and non-default, so that the null map for the key chunk is omitted
Flag 2: skip value nullmap. If the flag is set, all values in the value chunk are non-null and non-default, so that the null map for the value chunk is omitted
Flags 3-7: Reserved for future use

Schema evolution

Unlike Ignite 2.x approach, where binary object schema ID is defined by a set of fields which that are present in a binary object, for the schema-first approach we assign a monotonically growing identifier to each version of the cache schema. The ordering guarantees should be provided by the underlying metadata storage layer (for example, the current distributed metastorage implementation or consensus-based metadata storage). The schema identifier should be stored together with the data tuples (but not necessarily with each tuple individually: we can store schema ID along with a page or larger chunks of data). The history of schema versions must be stored for a long enough period of time to allow upgrade all existing data stored in a given cache.

Given schema evolution history, a tuple migration from version N-k to version N is a straightforward operation. We identify fields that were dropped during the last k schema operations and fields that were added (taking into account default field values) and update the tuple based on the field modifications. AfterwardsAfterward, the updated tuple is written in the schema version N layout format. The tuple upgrade may happen on read with optional writeback or on next update. Additionally, tuple upgrade in background is possible.

Since the tuple key hashcode is inlined to the tuple data for quick key lookups, we require that the set of key columns do not change during the schema evolution. In the future, we may remove this restriction, but this will require careful hashcode calculation adjustments since the hash code value should not change after adding a new column with default value. Removing a column from the key columns does not seem possible since it may produce duplicates, and checking for duplicates may require a full scan.

For example, consider the following sequence of schema modifications expressed in SQL-like terms:

...

Page tree

Versions Compared

Old Version 10

New Version 11

Key

Schema evolution