AsterixDB Object Serialization Reference

Type Tag Reference

Please refer to the following reference for the 1 byte type tags of various objects. It is important to note that when the type of the object is known (For example, when it is part of the closed schema for a Record) the type tag will be omitted from the serialization of the object.

1 INT8
2 INT16
3 INT32
4 INT64
11 FLOAT
12 DOUBLE
13 STRING
14 NULL
15 BOOLEAN
16 DATETIME
17 DATE
18 TIME
19 DURATION
20 POINT
22 ORDEREDLIST
23 UNORDEREDLIST
24 RECORD
29 ANY
30 LINE
31 POLYGON
32 CIRCLE
33 RECTANGLE
34 INTERVAL

BOOLEAN

1 byte - BOOLEAN tag, 15 (excluded when type is known)

1 byte - Value

Example: Byte Array: [15, 1]

15, - (BOOLEAN tag)

1 - (value is TRUE)

INT8

1 byte - Int8 tag, 1 (excluded when type is known)

1 byte - Value

Example: Byte Array: [1, 4]

1, (Int8 tag)

4 (value is 4)

INT16

1 byte - Int16 tag, 2 (excluded when type is known)

2 bytes - Value

Example: Byte Array: [2, 0, 8]

2, (Int16 tag)

0, 8 (value is 8)

INT32

1 byte - Int32 tag, 3 (excluded when type is known)

4 bytes - Value

Example: Byte Array: [3, 0, 0, 0, 23]

3, (Int32 tag)

0, 0, 0, 23 (value is 23)

INT64

1 byte - Int64 tag, 4 (excluded when type is known)

8 bytes - Value

Example: Byte Array: [3, 0, 0, 0, 0, 0, 0, 0, 42]

4, (Int64 tag)

0, 0, 0, 0, 0, 0, 0, 42 (value is 42)

FLOAT

1 byte - FLOAT tag, 11 (excluded when type is known)

4 bytes - Value, using IEEE 754 floating-point "single format" bit layout.

DOUBLE

1 byte - DOUBLE tag, 12 (excluded when type is known)

8 bytes - Value, using IEEE 754 floating-point "double format" bit layout.

Location Serializations

POINT -

1 byte - type tag (20, excluded when known)

8 bytes - nontagged DOUBLE serialization

8 bytes -nontagged DOUBLE serialization

LINE -

1 byte - type tag (30, excluded when known)

16 bytes - nontagged POINT serialization

16 bytes - nontagged POINT serialization

RECTANGLE -

1 byte - type tag (33, excluded when known)

16 bytes - nontagged POINT serialization

16 bytes - nontagged POINT serialization

CIRCLE -

1 byte - type tag (32, excluded when known)

16 bytes - nontagged POINT serialization

8 bytes - nontagged DOUBLE serialization

POLYGON -

1 byte - type tag (31, excluded when known)

2 bytes - NUMBER of points, as an int16

remaining bytes - A series of nontagged POINT serializations

STRING

1 byte - STRING tag, 13 (excluded when type is known)

1-5 bytes - Variable-length encoding of the LENGTH of string.
- Each byte stores seven bits of the number. The first bit of each byte notifies if it is the last byte. Specifically, if the first bit is set, then we need to shift the current value by seven and continue to read the next byte util we meet a byte whose first byte is unset.
  
  * e.g. if the number is < 128, it will be stored using one byte and the byte value keeps as original.
  * To store the number 255 (0xff) , it will be encoded as [0x81,0x7f]. To decode that value, it reads the 0x81
  * to know that the current value is (0x81 & 0x7f)= 0x01, and the first bit tells that there are more bytes to
  * be read. When it meets 0x7f, whose first flag is unset, it knows that it is the final byte to decode.
  * Finally it will return ( 0x01 << 7) + 0x7f === 255.

LENGTH bytes - Value of the string, UTF-8 encoded

Example: Byte Array: [13, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100]

13 (STRING TAG)
10 (LENGTH is ten)
109, 101, 115, 115, 97, 103, 101, 45, 105, 100 (String value, "message-id")

TIME Serializations:

DATE

1 byte - DATE tag, 17 (excluded when type is known)
4 bytes - INT32 representation of the number of days since 1970-01-01

TIME

1 byte - TIME tag, 18 (excluded when type is known)
4 bytes - INT32 representation of the number of milliseconds elapsed since the beginning of the day

DATETIME

1 byte - DATETIME tag, 16 (excluded when type is known)
8 bytes - INT64 representation of the number of milliseconds elapsed since 1970-01-01T00:00:00.000Z (also called chronon time)

DURATION

1 byte - DURATION tag, 19 (excluded when type is known)
4 bytes - INT32 representation of the number of months from the year and date fields
8 bytes - INT64 representation of the number of milliseconds from all other fields

INTERVAL

1 byte - INTERVAL tag, 34 (excluded when type is known)
1 byte - Tag of the type of interval (TIME, DATETIME, DATE)
Followed by two type files (for TIME, DATE)
- 4 bytes - INT32 representation of the start
- 4 bytes - INT32 representation of the end
or (for DATETIME)
- 8 bytes - INT64 representation of the start
- 8 bytes - INT64 representation of the end

ORDEREDLIST (Also applies to UNORDEREDLIST)

1 byte - Orderedlist tag, 22 (excluded when type is known)
1 byte - Type of items on the list
4 bytes - Total number of bytes
4 bytes - Number of items

if (type of list items is string, record, or list (length is not constant))
4 bytes per item (offsets)

for each item
Bytes of the item (In the case of a list of ANY, the items will include type tags)

Example: Nested List: [["message-id"]]

Byte Array: [22, 22, 0, 0, 0, 39, 0, 0, 0, 1, 0, 0, 0, 14, 13, 0, 0, 0, 26, 0, 0, 0, 1, 0, 0, 0, 14, 0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100]

22 (list tag)

22 (this is a list of lists)

0, 0, 0, 39 (39 bytes)

0, 0, 0, 1 (1 item)

0, 0, 0, 14 (offset for first item (a list) is 14)

13 (type of inner list is string)
0, 0, 0, 26 (size of inner list is 26) 0, 0, 0, 1, (there is one item on inner list) 0, 0, 0, 14, (the offset for this item is 14)
0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100 (item is a string of length 10, "message-id")

Record

1 byte - Record tag, 24 (excluded when type is known)
4 bytes - Total number of bytes

if (recordType is not closed)
1 byte - Boolean isExpanded

if (isExpanded)
4 bytes - Offset to open part

4 bytes - Number of closed fields

if (recordType hasNullableFields)
ceil (numberOfFields / 8) bytes - Nullbitmap (1 bit per field, "1" means field is Null for this record)

for each closed field for this record
4 bytes - Closed field offset

for each closed field that is not Null for this record
Bytes of the field (type is known from recordtype, so the bytes will not have a type tag)

if (isExpanded)
4 bytes - Number of open fields

for each open field, pairs sorted by hashcode ascending
4 bytes - Hash code
4 bytes - Offset

for each open field
Bytes of the field name (String, no type tag)
Bytes of the field (with type tag)

Example

DDL:

                drop dataverse test if exists;
                create dataverse test;

                use dataverse test;

                create type FacebookMessageType as closed {
                        message-id: int32

                }

                create dataset FacebookMessages(FacebookMessageType)

                primary key message-id;

                for $index in dataset Metadata.Index

                where $index.IndexName = "FacebookMessages"

                return $index;

Schema for the record:

                 open {
                  DataverseName: STRING,
                  DatasetName: STRING,
                  IndexName: STRING,
                  IndexStructure: STRING,
                  SearchKey: [ [ STRING ] ],
                  IsPrimary: BOOLEAN,
                  Timestamp: STRING,
                  PendingOp: INT32
                }

Byte Array: [24, 0, 0, 0, -40, 1, 0, 0, 0, -88, 0, 0, 0, 8, 0, 0, 0, 46, 0, 0, 0, 52, 0, 0, 0, 70, 0, 0, 0, 88, 0, 0, 0, 95, 0, 0, 0, -123, 0, 0, 0, -122, 0, 0, 0, -92, 0, 4, 116, 101, 115, 116, 0, 16, 70, 97, 99, 101, 98, 111, 111, 107, 77, 101, 115, 115, 97, 103, 101, 115, 0, 16, 70, 97, 99, 101, 98, 111, 111, 107, 77, 101, 115, 115, 97, 103, 101, 115, 0, 5, 66, 84, 82, 69, 69, 22, 0, 0, 0, 39, 0, 0, 0, 1, 0, 0, 0, 14, 13, 0, 0, 0, 26, 0, 0, 0, 1, 0, 0, 0, 14, 0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100, 1, 0, 28, 84, 117, 101, 32, 79, 99, 116, 32, 48, 55, 32, 49, 48, 58, 50, 50, 58, 49, 54, 32, 80, 68, 84, 32, 50, 48, 49, 52, 0, 0, 0, 1, 0, 0, 0, 1, 77, 124, -113, 81, 0, 0, 0, -76, 0, 13, 83, 101, 97, 114, 99, 104, 75, 101, 121, 84, 121, 112, 101, 22, 29, 0, 0, 0, 21, 0, 0, 0, 1, 0, 0, 0, 14, 13, 0, 4, 110, 117, 108, 108]

24, (Record)

0, 0, 0, -40, (Number of bytes)

1, (IsExpanded === true)

0, 0, 0, -88, (open offset)

0, 0, 0, 8, (8 closed fields)

0, 0, 0, 46, 0, 0, 0, 52, 0, 0, 0, 70, 0, 0, 0, 88, 0, 0, 0, 95, 0, 0, 0, -123, 0, 0, 0, -122, 0, 0, 0, -92, (4 bytes per closed field, offsets)

0, 4, 116, 101, 115, 116, (string "test")

0, 16, 70, 97, 99, 101, 98, 111, 111, 107, 77, 101, 115, 115, 97, 103, 101, 115, (string "FacebookMessages")

0, 5, 66, 84, 82, 69, 69, (string "BTREE")

22, 0, 0, 0, 39, 0, 0, 0, 1, 0, 0, 0, 14, (this shows that it is a list of size 1, type is orderedlist)
13, 0, 0, 0, 26, 0, 0, 0, 1, 0, 0, 0, 14, (the nested list is a list of size 1, type is string)
0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100, (the nested string is "message-id")

1, (Boolean true)

0, 28, 84, 117, 101, 32, 79, 99, 116, 32, 48, 55, 32, 49, 48, 58, 50, 50, 58, 49, 54, 32, 80, 68, 84, 32, 50, 48, 49, 52, (string timestamp)

0, 0, 0, 1, (PendingOP == 1)

0, 0, 0, 1, (1 open Field)

77, 124, -113, 81, (Hash code for the field name)

0, 0, 0, -76, (Offset for the field)

0, 13, 83, 101, 97, 114, 99, 104, 75, 101, 121, 84, 121, 112, 101, (name of open field (length 13), string "FieldTypes")

22, 29, 0, 0, 0, 21, 0, 0, 0, 1, 0, 0, 0, 14, (value of open field, ordered list of size 1, of type ANY) 13, 0, 4, 110, 117, 108, 108 (string "null")

More Complicated Example (Highly Nested and Open record)

DDL:

                drop dataverse test if exists;
                create dataverse test;

                use dataverse test;

                create type S as closed{
                        id: int32,
                        Species: string
                }
                create type GS as closed{
                        id: int32,
                        Genus: string,
                        lower: S
                }
                create type FGS as open{
                        id: int32,
                        Family: string
                }
                create type OFGS as closed{
                        id: int32,
                        Order: string,
                        lower: FGS
                }

                create dataset Ss(S)
                primary key id;
                create dataset GSs(GS)
                primary key id;
                create dataset FGSs(FGS)
                primary key id;
                create dataset OFGSs(OFGS)
                primary key id;


                insert into dataset Ss(
                        {"id":1,"Species":"Gulo"}
                );
                insert into dataset GSs(
                        for $S in dataset Ss
                        where $S.Species = "Gulo"
                        return {"id":1,"Genus":"Gulo","lower":$S}
                );
                insert into dataset FGSs(
                        for $S in dataset GSs
                        where $S.lower.Species = "Gulo"
                        return {"id":1,"Family":"Mustelinae","lower":$S}
                );
                insert into dataset OFGSs(
                        for $S in dataset FGSs
                        where $S.lower.lower.Species = "Gulo"
                        return {"id":1,"Order":"Carnivora","lower":$S}
                );

                for $test in dataset OFGSs
                return $test

Schema for the record:

                 closed {
                        id: INT32,
                        Order: STRING,
                        lower: open {
                                id: INT32,
                                Family: STRING
                        }
                }

Byte Array: [24, 0, 0, 0, -41, 0, 0, 0, 3, 0, 0, 0, 21, 0, 0, 0, 25, 0, 0, 0, 36, 0, 0, 0, 1, 0, 9, 67, 97, 114, 110, 105, 118, 111, 114, 97, 0, 0, 0, -76, 1, 0, 0, 0, 38, 0, 0, 0, 2, 0, 0, 0, 22, 0, 0, 0, 26, 0, 0, 0, 1, 0, 10, 77, 117, 115, 116, 101, 108, 105, 110, 97, 101, 0, 0, 0, 1, 6, 38, 43, 1, 0, 0, 0, 50, 0, 5, 108, 111, 119, 101, 114, 24, 0, 0, 0, 123, 1, 0, 0, 0, 10, 0, 0, 0, 3, 0, 0, 13, 27, 0, 0, 0, 38, 4, 24, 25, -50, 0, 0, 0, 47, 6, 38, 43, 1, 0, 0, 0, 61, 0, 2, 105, 100, 3, 0, 0, 0, 1, 0, 5, 71, 101, 110, 117, 115, 13, 0, 4, 71, 117, 108, 111, 0, 5, 108, 111, 119, 101, 114, 24, 0, 0, 0, 55, 1, 0, 0, 0, 10, 0, 0, 0, 2, -21, -127, -39, 28, 0, 0, 0, 39, 0, 0, 13, 27, 0, 0, 0, 30, 0, 2, 105, 100, 3, 0, 0, 0, 1, 0, 7, 83, 112, 101, 99, 105, 101, 115, 13, 0, 4, 71, 117, 108, 111]

24, (Record)

0, 0, 0, -41, (size)

0, 0, 0, 3, (3 closed fields)

0, 0, 0, 21, 0, 0, 0, 25, 0, 0, 0, 36, (closed offsets)

0, 0, 0, 1, (id = 1)

0, 9, 67, 97, 114, 110, 105, 118, 111, 114, 97, (order = "Carnivora")

0, 0, 0, -76, (number of bytes for lower)

1, (lower is expanded)

0, 0, 0, 38, (open offset)

0, 0, 0, 2, (two closed fields)

0, 0, 0, 22, 0, 0, 0, 26, (closed offsets)

0, 0, 0, 1, (lower.id = 1)

0, 10, 77, 117, 115, 116, 101, 108, 105, 110, 97, 101, (lower.Family = "Mustelinae")

0, 0, 0, 1, (1 open field)

6, 38, 43, 1, 0, 0, 0, 50, (hash and offset of open field)

0, 5, 108, 111, 119, 101, 114, (name of open field = "lower")

24, (open field is a record)

0, 0, 0, 123, (size of open field)

1, (lower.lower is expanded)

0, 0, 0, 10, (open part offset)

0, 0, 0, 3, (3 open fields)

0, 0, 13, 27, 0, 0, 0, 38, 4, 24, 25, -50, 0, 0, 0, 47, 6, 38, 43, 1, 0, 0, 0, 61, (Hashes and offsets for open fields)

0, 2, 105, 100, (fieldName = "id")

3, (type is int32)

0, 0, 0, 1, (id is 1)

0, 5, 71, 101, 110, 117, 115, (fieldname = "Genus")

13, (type is tring)

0, 4, 71, 117, 108, 111, (fieldValue = "Gulo")

0, 5, 108, 111, 119, 101, 114, (name of lower.lower.lower = "lower")

24, (lower.lower.lower is a record)

0, 0, 0, 55, (size of record)

1, (is expanded)

0, 0, 0, 10, (open offset)

0, 0, 0, 2, (2 open fields)

-21, -127, -39, 28, 0, 0, 0, 39, 0, 0, 13, 27, 0, 0, 0, 30, (open hashes and offsets)