Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

BinaryObject API should be reworked, as it will not represent actual serialized objects anymore. It should be replaced with something like BinaryRecord or DataRecord representing a record in a cache or table. Similarly Similar to the current binary objects, records will provide access to individual fields. A record can also be deserialized into a class with any subset of fields represented in the record.

...

There are several ways a schema can be defined. The initial entry point to the schema definition is SchemaBuilder java API:

TBD (see SchemaBuilders class for details)

The schema builder calls are transparently mapped to DDL statements so that all operations possible via a builder are also possible via DDL and vice versa.Additionally, we may introduce an API that will infer the schema from a key-value pair using class fields and annotations. The inference happens on the calling site of the node invoking the table modification operation.

Table schema should be automatically exposed to the tabel configuration subtree so that simple schema changes are available via ignite CLI and the schema can be defined during the table creation via ignite CLI.

Data restrictions

The Schema-first approach imposes certain natural requirements which are more strict than binary object serialization format:

...

TypeSizeDescription
Bitmask(n)n/8 bytesA fixed-length bitmask of n bits
Int81 byte1-byte signed integer
Uint81 byte1-byte unsigned integer
Int162 bytes2-byte signed integer
Uint162 bytes2-byte unsigned integer
Int324 bytes4-byte signed integer
Uint324 bytes4-byte unsigned integer
Int648 bytes8-byte signed integer
Uint648 bytes8-byte unsigned integer
Float4 bytes4-byte floating-point number
Double8 bytes8-byte floating-point number
Number([n])VariableVariable-length number (optionally bound by n bytes in size)
DecimalVariableVariable-length floating-point number
UUID16 bytesUUID
StringVariableA string encoded with a given Charset
Date3 bytesA timezone-free date encoded as a year (15 1 sign bit + 14 bits), month (4 bits), day (5 bits)
Time4 5 bytesA timezone-free time encoded as padding (5 3 bits), hour (5 bits), minute (6 bits), second (6 bits), millisecond microseconds (10 20 bits)
Datetime7 8 bytesA timezone-free datetime encoded as (date, time)
Timestamp8 10 bytesNumber of milliseconds microseconds since Jan 1, 1970 00:00:00.000 000000 (with no timezone)
BinaryVariableVariable-size byte array

...

Given a set of user-defined columns, this set is then rearranged so that fixed-sized columns go first. This sorted set of columns is used to form a row. Row layout is as follows:

Field

Size

Comments
Schema version2 bytes
.

short number. The possible values are:

  • positive - regular row: key  and value chunks are present;
  • 0 - no value. If the flag is set, the value chunk is omitted, e.g. the row represents a tombstone or key-row to lookup by the key;
  • negative - invalid schema version.
Flags2 byte
Key columns hash4 bytes
Key chunk:

Key chunk size
4 bytes
Null-mapnumber of columns / 8

Flags1 byte
Variable-length columns offsets table size0-2 bytes
  • Vartable is skipped (zero size) when the chunk contains one varlen column or doesn't contain varlen column.
  • 1 byte size for table with TINY format (see table below)
  • 2 bytes for table with MEDIUM and LARGE format (see table below)
Variable-length columns offsets tableVariable (number of non-null varlen columns *
4
<format_size>)<format_size> - depends on the Flags field. See the table below
Fix-sized columns valuesVariable
Variable-length columns valuesVariable
Value chunk:

Value chunk size4 bytes
Flags1 byte
Null-map

(number of columns / 8

) or 0 bytes

Zero size if and only if schema has no nullable columns
Variable-length  columns offsets table size2 or 0 bytes
  • Vartable is skipped (zero size) when the chunk contains one varlen column or doesn't contain varlen column.
  • 1 byte size for table with TINY format (see table below)
  • 2 bytes for table with MEDIUM and LARGE format (see table below)
Variable-length  columns offsets tableVariable (number of non-null varlen columns *
4)
<format_size>)<format_size> - depends on the Flags field. See the table below
Fix-sized columns valuesVariable
Variable-length columns valuesVariable

The flags field is a bitmask with each bit treated as a flag, with the following flags available (from flag 0 being the LSB to flag 7 being MSB):

  • Flag 0: no value. If the flag is set, the value chunk is omitted, e.g. the row represents a tombstone
  • Flag 1: skip key nullmap. If the flag is set, all column values in the key chunk are non-null, so that the null map for the key chunk is omitted
  • Flag 2: skip value nullmap. If the flag is set, all column values in the value chunk are non-null, so that the null map for the value chunk is omitted
  • Flag 3: skip key varlen table. If flag is set, all column values in the key chunk either of fix-sized type or null, so that the varlen table for key chunk is omitted.
  • Flag 4: skip value varlen table. If flag is set, all column values in the value chunk either of fix-sized type or null, so that the varlen table for value chunk is omitted.
  • Flags 5-15: Reserved for future use
Schema evolution

Unlike Ignite 2.x approach, where binary object schema ID is defined by a set of fields that are present in a binary object, for the schema-first approach we assign a monotonically growing identifier to each version of the cache schema. The ordering guarantees should be provided by the underlying metadata storage layer (for example, the current distributed metastorage implementation or consensus-based metadata storage). The schema identifier should be stored together with the data rows (but not necessarily with each row individually: we can store schema ID along with a page or larger chunks of data). The history of schema versions must be stored for a long enough period of time to allow upgrade all existing data stored in a given cache.

Given schema evolution history, a row migration from version N-k to version N is a straightforward operation. We identify fields that were dropped during the last k schema operations and fields that were added (taking into account default field values) and update the row based on the field modifications. Afterward, the updated row is written in the schema version N layout format. The row upgrade may happen on read with an optional writeback or on next update. Additionally, row upgrade in background is possible.

Since the row key hashcode is inlined to the row data for quick key lookups, we require that the set of key columns do not change during the schema evolution. In the future, we may remove this restriction, but this will require careful hashcode calculation adjustments since the hash code value should not change after adding a new column with default value. Removing a column from the key columns does not seem possible since it may produce duplicates, and checking for duplicates may require a full scan.

Additionally to adding and removing columns, it will be possible to allow column type migrations when type change is non-ambiguous (a type upcast, e.g. Int8 → Int16,  or by means of a certain expression, e,g, Int8 → String using CAST expression). Type conversions that narrow the column range (e.g. Int16 → Int8) must only be allowed using explicit expressions that will allow Ignite to validate that no RangeOutOfBoundsException is possible during the conversion.

For example, consider the following sequence of schema modifications expressed in SQL-like terms:

Code Block
languagesql
CREATE TABLE Person (id INT, name VARCHAR(32), lastname VARCHAR(32), taxid int);
ALTER TABLE Person ADD COLUMN residence VARCHAR(2) DEFAULT "GB";
ALTER TABLE Person DROP COLUMN lastname, taxid;
ALTER TABLE Person ADD COLUMN lastname DEFAULT "N/A";

This sequence of modifications will result in the following schema history

...

With this history, upgrading a row (1, "John", "Doe") of version 1 to version 4 means erasing columns lastname and taxid and adding columns residence with default "GB" and lastname (the column is returned back) with default "N/A" resulting in row (1, "John", "GB", "N/A").

Class-agnostic schema mapping

It's clear that given a fixed schema, we can generate an infinite number of classes that match the column of this schema. This observation can be used to simplify ORM for the end-users. For the APIs which return Java objects, the mapping from schema columns to the object fields can be constructed dynamically, allowing to deserialize a single row into instances of different classes.

For example, let's  say we have a schema PERSON (id INT, name VARCHAR (32), lastname VARCHAR (32), residence VARCHAR (2), taxid INT). Each row of this schema can be deserialized into the following classes:

Code Block
languagejava
class Person {
    int id;
    String name;
    String lastName;
}
Code Block
languagejava
class RichPerson {
    int id;
    String name;
    String lastName;
    String residence;
    int taxId;
}

For each table, a user may specify a default Java class binding, and for each individual operation a user may provide a target class for deserialization:

Code Block
languagejava
Person p = table.get(key, Person.class);

Given the set of fields in the target class, Ignite may optimize the amount of data sent over the network by skipping fields that would be ignored during deserialization.

Update operation with object of truncated class is also possible, but missed fields will be treated as "not-set" as if it is done via SQL INSERT statement with some PERSON table fields missed. Missed field values will be implicitly set to DEFAULT column value regarding the row schema version.

Code Block
languagejava
table.insert(Person);

It may be impossible to insert an object/row with missed field if field is declared with NOT-NULL constraint and without DEFAULT (non-null) value specified.

Type mapping

Ignite will provide out-of-box mapping from standard platform types (Java, C#, C++) to built-in primitives. A user will be able to alter this mapping using some external mechanism (e.g. annotations to map long values to Number). Standard mapping is listed in the table below:

...

Java has no native support for unsigned types. We still can introduce 'unsigned' flag to schema type or separate binary type-codes, and allow to map to the closest types of wider range. E.g. map Uint8 → short and recheck constraints during serialization.

If one will try to serialize object with 'short' value out of Uint8 range then it end up with exception (ColumnValueIsOutOfRangeException).

Dynamic schema expansion (flexible schemas)

One of the important benefits of binary objects was the ability to store objects with different sets of fields in a single cache. We can accommodate for a very similar behavior in the schema-first approach.

When an object is inserted into a table, we attempt to 'fit' object fields to the schema columns. If a Java object has some extra fields which are not present in the current schema, the schema is automatically updated to store additional extra fields that are present in the object.

...


For the small rows, the metadata sizes may introduce a very noticeable overhead, so it looks reasonable to write them in a more compact way using different techniques.

  • VarInt - variable size integer for sizes
  • different VarTable formats with byte/short/int offsets
  • skip writing VarTable and/or Null-map if possible.

The flags field is used to detect the format. We propose 3 formats for a vartable: tiny, medium, and large with offset fields sizes of byte, short, and int respectively.
Vartable length field is the size of byte for tiny format and the size of short for others.
Vertable length is calculated as: <count_of _not_null_varlen_fields> - 1. The offset for the first varlen field is not stored at the table. It is calculated as the begin of the varlen values block.

IMPORTANT: having multiple formats MUST guarantee the key (as well as value) chunk will be always written in a single possible way to allow comparing chunks of rows of the same version as just byte arrays.

The flags field is a bitmask with each bit treated as a flag, with the following flags available (from flag 0 being the LSB to flag 7 being MSB):

Flags BitsDescription
0, 1

VarTable formats:

  • (0, 0) - SKIPPED.  VarTable for chunk is omitted  (all column values in the chunk either of fix-sized type or null);
  • (0, 1) - TINY format (1 byte for offset), format_size = 1;
  • (1, 0) - MEDIUM format (2 bytes for offset), format_size = 2;
  • (1, 1) - LARGE format (4 bytes for offset), format_size = 4
2-7Reserverd


Risks and Assumptions

n/a

Tickets

...