Page History

...

The field location information is represented as a table of field offsets. This table allows to determine both the offset and the length of each field.

To represent NULL values a tuple may contain a nullmap. Nullmap is skipped if there are no actual NULL values in a given tuple.To be more precise the layout of a tuple looks like this:

Header;Nullmap (optionally);
Offset table;
Value area.

Header

It contains only one byte with flags.

Bits 0-1: Size class of the variable-length area.

Bit 2: Set if the nullmap is presentFlag indicating that the size class is not optimal.

Size class encodes the number of bytes used for entries in the offset table. It is encoded like this:

...

NOTE: By the way, the last option works fine for C++ and possibly other languages but for Java it will be hard to support, so it might be excluded.

Nullmap

If a schema has nullable columns then the corresponding tuple may contain a nullmap. The nullmap is skipped if there are no actual NULL values in a given tuple instance.

Offset Table

The number of elements in the table is equal to Nullmap is a bitset that contains N bits, occupies (N + 7) / 8 bytes, where `N` is the number of columns in the schema.

In the bitset if a bit is set to `1` then the respective field is NULL.

Offset Table

The number of elements in the table is equal to the number of columns in the schema.

The size The size of table elements might be 1, 2, 4, or 8 bytes depending on the header flags.

Each element in the table specifies where the corresponding field ends. Or in In other words, for each given field the table stores the offset where the next field starts. The last element in the table specifies the end of the last field and at the same time the end of the whole value area and consequently the end of the entire tuple.

...

If a value is equal to NULL then it is absent in the value area. This means that in the offset table the corresponding entry is equal to the previous entry. At the same time the corresponding bit in the nullmap is set.

For any some variable-length type types we can encounter a value with zero length. Quite naturally To distinguish a zero-length variable-length value translates to a zero-length field in a tuple. This approach is extended to fixed-size types by introducing a notion of default values. We define specific default values for different types (specified in the table below). If a given value is equal to the corresponding default value then this translates to a zero-length field in a tuple.

To sum things up, when a zero-length field is met (by looking in the offset table) we have the following cases:

...

from a null value we use a special magic byte that denotes an empty value. That is an empty value is encoded as single-byte sequence – 0x80. In turn if the original variable-length value starts with the magic byte 0x80 byte then this byte is doubled.

The Number and Decimal types are never empty, at least one significant byte is always present. The variable-length types that use the magic byte for encoding are as follows:

String;
Binary;
Bitmask

...

.

The list of supported data types is as follows:

0 0.0 0.0 0 0 00000000-0000-0000-0000-000000000000empty binaryempty bit-string

Jan 1, 1 BC (1 BC is immediately

before 1 AD in the Gregorian calendar)00:00:00.000000 Jan 1, 1 BC, 00:00:00.000000 Jan 1, 1970, 00:00:00.000000 0 (P0D)false

Type	Field Size	Default Value	Description
Int8	1`0`	1-byte signed integer
Int16	1, 2`0`	2-byte signed integer, but may occupy less space due to compression mechanism described below
Int32	1, 2, 4`0`	4-byte signed integer, but may occupy less space due to compression mechanism described below
Int64	1, 2, 4, 8		8-byte signed integer, but may occupy less space due to compression mechanism described below
Float	4	4	4-byte floating-point number
Double	4, 8		8-byte floating-point number, but may occupy 4 bytes if fits into float w/o loss of precision
Number	variable		Variable-length integer
Decimal	variable		Variable-length fixed-point number, the scale is determined by the schema
UUID	16		UUID
String	variable	empty string	An utf-8 encoded string
Binary	variable		Variable-length arbitrary binary data
Bitmask	variable		Variable-length binary data representing a bit-string
Date	3		3	A timezone-free date (a year, month, day)
Time	4, 5, 6		A timezone-free time (hour, minute, second, microseconds)
DateTime	7, 8, 9		A timezone-free datetime encoded as (date, time)
Timestamp	8, 12		Number of microseconds since Jan 1, 1970 00:00:00.000000 (with no timezone)
Duration	8, 12	`0` (PT0S)	See below
Period	3, 6, 12		See below
Boolean	1		A boolean value (either `true` of `false`)

Integer Representation

All integer values are stored in the little-endian byte order.

...

Boolean Representation

A single byte containing 1 for the value of true. The and 0 for the value of false does not need any representation as it stored as a default zero-size field.

Corollary

If the number of fields is N and t is an array that stores a binary tuple we can find the answers for the following:

Does the tuple contain a nullmap?

hasNullmap = t[0] & 0b100;

How many bytes are occupied by the nullmap?

...

a binary tuple we can find the answers for the following:

How many bytes are occupied by one offset table entry?

...

What is offset of the value area?

valueBaseOffset = 1 + nullmapBytes + offsetTableBytes;

What is the whole tuple size?

...

In order to build a tuple using minimum possible space it is required to learn two things:

...

what is the total length of all non-null values

...

. After that we can figure out the minimum possible size of the offset table entries.

Thus, generally speaking, building a binary tuple is a two-pass procedure. Sometimes it might be possible to turn this into a single pass (almost) by over-provisioning the allocated storage for the worst case and then fixing it up at the end.

...

Page tree

Versions Compared

Old Version 46

New Version Current

Key

Header

Nullmap

Offset Table

Offset Table

Integer Representation

Boolean Representation

Corollary