BinaryObject API should be reworked, as it will not represent actual serialized objects anymore. It should be replaced with something like BinaryRecord or DataRecord representing a record in a cache or table. Similarly Similar to the current binary objects, records will provide access to individual fields. A record can also be deserialized into a class with any subset of fields represented in the record.

Schema Definition API

There are several ways a schema can be defined. The initial entry point to the schema definition is SchemaBuilder java API:

TBD (see SchemaBuilders class for details)

The schema builder calls are transparently mapped to DDL statements so that all operations possible via a builder are also possible via DDL and vice versa.

Data restrictions

The Schema-first approach imposes certain natural requirements which are more strict than binary object serialization format:

Column The column type must be of one of a predefined set of available 'primitives' (including Strings, UUIDs, date & time values)
Arbitrary nested objects and collections are not allowed as column values. Nested POJOs should either be inlined into a schema , or stored as BLOBs
Date & time values should be compressed with preserving natural order and decompression should be a trivial operation (like applying bitmask).

The suggested list of supported built-in data types is listed in the table below:

Type	Size	Description
Bitmask(n)	⌈n/8⌉ bytes	A fixed-length bitmask of n bits
ByteInt8	1 byte	1-byte signed integer
Uint8	1 byte	1-byte unsigned integer
Int16Short	2 bytes	2-byte signed integer
Uint16	2 bytes	2-byte unsigned integer
IntegerInt32	4 bytes	4-byte signed integer
Uint32	4 bytes	4-byte unsigned integer
LongInt64	8 bytes	8-byte signed integer
Uint64	8 bytes	8-byte unsigned integer
Float	4 bytes	4-byte floating-point number
Double	8 bytes	8-byte floating-point number
Number([n])	Variable	Variable-length number (optionally bound by n bytes in size)
Decimal	Variable	Variable-length floating-point number
UUID	16 bytes	UUID
String	Variable	String A string encoded with a given Charset
Date	3 bytes	A timezone-free date encoded as day/month (1 byte), year (2 bytesa year (1 sign bit + 14 bits), month (4 bits), day (5 bits)
Time	5 bytes	A timezone-free time encoded as padding (3 bits), hour (1 byte5 bits), minute (1 byte6 bits), second (1 byte6 bits), millisecond microseconds (2 bytes20 bits)
Datetime	8 bytes	A timezone-free datetime encoded as (date, time)
InstantTimestamp	8 10 bytes	Number of milliseconds microseconds since Jan 1, 1970 00:00:00.000 000000 (with no timezone)
BLOBBinary	Variable	Variable-size byte array

...

Given a set of user-defined columns, this set is then rearranged so that fixed-sized columns go first. This sorted set of columns is used to form a tuplerow. Tuple Row layout is as follows:

Field	Size	Comments
Schema version

2 bytes.

short number. The possible values are:

positive - regular row: key and value chunks are present;
0 - no value. If the flag is set, the value chunk is omitted, e.g. the row represents a tombstone or key-row to lookup by the key;
negative - invalid schema version.

1 byte


Key columns hash	4 bytes
Key

columns

chunk:
Key

columns full

chunk size	4 bytes
Flags	1 byte
Variable-length columns offsets table size	0-2

(3?) bytesKey columns varsize columns offsets table size2 bytes

bytes	Vartable is skipped (zero size) when the chunk contains one varlen column or doesn't contain varlen column. 1 byte size for table with TINY format (see table below) 2 bytes for table with MEDIUM and LARGE format (see table below)
Variable-length columns

Key columns varsize columns

offsets table

Variable (number of non-null

non-default varsize

varlen columns *

2(3?))Key columns null-defaults map⌈number of columns / 8⌉

<format_size>)	<format_size> - depends on the Flags field. See the table below
Fix-sized columns

Key columns fixed size

values

Variable

Key

Variable-length columns

variable size

values	Variable
Value

columns

chunk:
Value

columns full

chunk size

2 (3?) bytesValue columns varsize columns offsets table size2 bytes

4 bytes
Flags	1 byte
Null-map	(number of columns / 8 ) or 0 bytes	Zero size if and only if schema has no nullable columns
Variable-length columns offsets table size	2 or 0 bytes	Vartable is skipped (zero size) when the chunk contains one varlen column or doesn't contain varlen column. 1 byte size for table with TINY format (see table below) 2 bytes for table with MEDIUM and LARGE format (see table below)
Variable-length columns

Value columns varsize columns

offsets table

Variable (number of non-null

non-default varsize

varlen columns *

2(3?))Value columns null-defaults map⌈number of columns / 8⌉

<format_size>)	<format_size> - depends on the Flags field. See the table below
Fix-sized columns

Value columns fixed size

values

Variable

Value

Variable-length columns

variable size

values

Variable

Schema evolution

Unlike Ignite 2.x approach, where binary object schema ID is defined by a set of fields which are present in a binary object, for the schema-first approach we assign a monotonically growing identifier to each version of the cache schema. The ordering guarantees should be provided by the underlying metadata storage layer (for example, the current distributed metastorage implementation or consensus-based metadata storage). The schema identifier should be stored together with the data tuples (but not necessarily with each tuple individually: we can store schema ID along with a page or larger chunks of data). The history of schema versions must be stored for a long enough period of time to allow upgrade all existing data stored in a given cache.

Given schema evolution history, a tuple migration from version N-k to version N is a straightforward operation. We identify fields that were dropped during the last k schema operations and fields that were added (taking into account default field values) and update the tuple based on the field modifications. Afterwards, the updated tuple is written in the schema version N layout format. The tuple upgrade may happen on read with optional writeback or on next update. Additionally, tuple upgrade in background is possible.

For example, consider the following sequence of schema modifications expressed in SQL-like terms:

Code Block

language	sql

CREATE TABLE Person (id INT, name VARCHAR(32), lastname VARCHAR(32), taxid int);
ALTER TABLE Person ADD COLUMN residence VARCHAR(2) DEFAULT "GB";
ALTER TABLE Person DROP COLUMN lastname, taxid;
ALTER TABLE Person ADD COLUMN lastname DEFAULT "N/A";

This sequence of modifications will result in the following schema history

...

With this history, upgrading a tuple (1, "John", "Doe") of version 1 to version 4 means erasing columns lastname and taxid and adding columns residence with default "GB" and lastname (the column is returned back) with default "N/A" resulting in tuple (1, "John", "GB", "N/A").

Class-agnostic schema mapping

It's clear that given a fixed schema, we can generate an infinite number of classes that match the column of this schema. This observation can be used to simplify ORM for the end-users. For the APIs which return Java objects, the mapping from schema columns to the object fields can be constructed dynamically, allowing to deserialize a single tuple into instances of different classes.

For example, let's say we have a schema PERSON (id INT, name VARCHAR (32), lastname VARCHAR (32), residence VARCHAR (2), taxid INT). Each tuple of this schema can be deserialized into the following classes:

Code Block

language	java

class Person {
    int id;
    String name;
    String lastName;
}

Code Block

language	java

class RichPerson {
    int id;
    String name;
    String lastName;
    String residence;
    int taxId;
}

For each table, a user may specify a default Java class binding, and for each individual operation a user may provide a target class for deserialization:

Code Block

language	java

Person p = table.get(key, Person.class);

Given the set of fields in the target class, Ignite may optimize the amount of data sent over the network by skipping fields that would be ignored during deserialization.

Type mapping

Ignite will provide out-of-box mapping from standard platform types (Java, C#, C++) to built-in primitives. A user will be able to alter this mapping using some external mechanism (e.g. annotations to map long values to Number). Standard mapping is listed in the table below:

...

For the small rows, the metadata sizes may introduce a very noticeable overhead, so it looks reasonable to write them in a more compact way using different techniques.

VarInt - variable size integer for sizes
different VarTable formats with byte/short/int offsets
skip writing VarTable and/or Null-map if possible.

The flags field is used to detect the format. We propose 3 formats for a vartable: tiny, medium, and large with offset fields sizes of byte, short, and int respectively.
Vartable length field is the size of byte for tiny format and the size of short for others.
Vertable length is calculated as: <count_of _not_null_varlen_fields> - 1. The offset for the first varlen field is not stored at the table. It is calculated as the begin of the varlen values block.

IMPORTANT: having multiple formats MUST guarantee the key (as well as value) chunk will be always written in a single possible way to allow comparing chunks of rows of the same version as just byte arrays.

The flags field is a bitmask with each bit treated as a flag, with the following flags available (from flag 0 being the LSB to flag 7 being MSB):

Flags Bits

Description

0, 1

VarTable formats:

(0, 0) - SKIPPED. VarTable for chunk is omitted (all column values in the chunk either of fix-sized type or null);
(0, 1) - TINY format (1 byte for offset), format_size = 1;
(1, 0) - MEDIUM format (2 bytes for offset), format_size = 2;
(1, 1) - LARGE format (4 bytes for offset), format_size = 4

2-7

Reserverd

...

Dynamic schema expansion (flexible schemas)

One of the important benefits of binary objects was the ability to store objects with different sets of fields in a single cache. We can accommodate for a very similar behavior in the schema-first approach.

When an object is inserted into a table, we attempt to 'fit' object fields to the schema columns. If a Java object has some extra fields which are not present in the current schema, the schema is automatically updated to store additional extra fields that are present in the object.

On the other hand, if an object has fewer fields than the current schema, the schema is not updated auto(such scenario usually means that an update is executed from an outdated client which did not yet receive a proper object class version). In other words, columns are never dropped during automatic schema evolution; a column can only be dropped by an explicit user command.

Risks and Assumptions

n/a

Tickets

...

http://apache-ignite-developers.2346864.n4.nabble.com/IEP-54-Schema-first-approach-for-3-0-td49017.html

Reference Links

n/a

...

Page tree

Versions Compared

Old Version 9

New Version Current

Key

Schema Definition API

Data restrictions

Schema evolution

Class-agnostic schema mapping

Type mapping

Dynamic schema expansion (flexible schemas)

Risks and Assumptions

Tickets

Reference Links

n/a

Page tree

Page History

Versions Compared

Old Version 9

New Version Current

Key

Schema Definition API

Data restrictions

Schema evolution

Class-agnostic schema mapping

Type mapping

Dynamic schema expansion (flexible schemas)

Risks and Assumptions

Tickets

Reference Links

n/a