ID	IEP-92
Author	Aleksey Demakov
Sponsor
Created
Status	DRAFT

Motivation

Considering e.g. a storage engine internal operation, its interaction with the SQL engine, or a database system as a whole we can see that row-like tuples are used in a variety of cases.

We already have IEP-54[1] that defines a representation of table rows suitable for a key-value storage engine. It clearly separates key and value columns by keeping them in two distinct chunks. The separation is strictly defined by the table schema. The primary key columns go to the IEP-54[1] key chunk and other columns go to the value chunk.

However this approach does not cover all the cases when a database system deals with row-like data. For instance, execution of an SQL query produces a result set that may consist of rows containing columns from different tables, SQL-expressions, aggregate functions, etc. For such an arbitrary set of columns the distinction between key and value columns makes no sense at all.

When a storage engine deals with secondary indexes it picks some columns from a row and forms a tuple where the distinction between primary-key and non-primary-key columns does not play a role.

For a parametrized SQL query the query parameters also might need to be packed together into single tuple to be fed to the database.

This proposal introduces a new binary format that might be suitable for all of these cases and more.

The design principles behind this format are as follows:

Keep it simple. Easy to build and parse.
Provide O(1) random access to every value.
Make it reasonably compact.

Description

We consider only the case when the tuple content is completely defined by a corresponding schema. Schema relieves tuples of any metadata and thus provides space savings. The downside is that the tuple data can only be properly interpreted if the corresponding schema is available. For our use cases this is not a problem as the schema can always be determined from the context.

However for fast random access to individual tuple fields the general tuple structure depends only on the schema size. No other metadata is needed to find the required field. But one needs to learn the field data type from the schema when it is required to interpret the field as a value.

Tuple Layout

A binary tuple is kept as a sequence of fields contiguously going in memory one after another in the same order as they go in the tuple schema. The number of fields is defined by the schema too. Each field is a sequence of bytes that can be interpreted only after learning the field type in the schema.

This field location information is represented as a table of field offsets. This table allows to find both the location and the length of each field.

To represent NULL values a tuple may contain a nullmap. Nullmap is skipped if there are no actual NULL values in a given tuple.

To be more precise the layout of a tuple looks like this:

Header;
Nullmap;
Offset table;
Value area.

Reference Links

1. IEP-54: Schema-first Approach

Page tree

Motivation

Description

Tuple Layout

Reference Links

Tickets

Page tree

IEP-92: Binary Tuple Format

Motivation

Description

Tuple Layout

Reference Links

Tickets