Type Tag Reference
Please refer to the following reference for the 1 byte type tags of various objects. It is important to note that when the type of the object is known (For example, when it is part of the closed schema for a Record) the type tag will be omitted from the serialization of the object.
- 1 INT8
- 2 INT16
- 3 INT32
- 4 INT64
- 11 FLOAT
- 12 DOUBLE
- 13 STRING
- 14 NULL
- 15 BOOLEAN
- 16 DATETIME
- 17 DATE
- 18 TIME
- 19 DURATION
- 20 POINT
- 22 ORDEREDLIST
- 23 UNORDEREDLIST
- 24 RECORD
- 29 ANY
- 30 LINE
- 31 POLYGON
- 32 CIRCLE
- 33 RECTANGLE
- 34 INTERVAL
BOOLEAN
- 1 byte - BOOLEAN tag, 15 (excluded when type is known)
- 1 byte - Value
Example: Byte Array: [15, 1]
- 15, - (BOOLEAN tag)
- 1 - (value is TRUE)
INT8
- 1 byte - Int8 tag, 1 (excluded when type is known)
- 1 byte - Value
Example: Byte Array: [1, 4]
- 1, (Int8 tag)
- 4 (value is 4)
INT16
- 1 byte - Int16 tag, 2 (excluded when type is known)
- 2 bytes - Value
Example: Byte Array: [2, 0, 8]
- 2, (Int16 tag)
- 0, 8 (value is 8)
INT32
- 1 byte - Int32 tag, 3 (excluded when type is known)
- 4 bytes - Value
Example: Byte Array: [3, 0, 0, 0, 23]
- 3, (Int32 tag)
- 0, 0, 0, 23 (value is 23)
INT64
- 1 byte - Int64 tag, 4 (excluded when type is known)
- 8 bytes - Value
Example: Byte Array: [3, 0, 0, 0, 0, 0, 0, 0, 42]
- 4, (Int64 tag)
- 0, 0, 0, 0, 0, 0, 0, 42 (value is 42)
FLOAT
- 1 byte - FLOAT tag, 11 (excluded when type is known)
- 4 bytes - Value, using IEEE 754 floating-point "single format" bit layout.
DOUBLE
- 1 byte - DOUBLE tag, 12 (excluded when type is known)
- 8 bytes - Value, using IEEE 754 floating-point "double format" bit layout.
Location Serializations
POINT -
- 1 byte - type tag (20, excluded when known)
- 8 bytes - nontagged DOUBLE serialization
- 8 bytes -nontagged DOUBLE serialization
LINE -
- 1 byte - type tag (30, excluded when known)
- 16 bytes - nontagged POINT serialization
- 16 bytes - nontagged POINT serialization
RECTANGLE -
- 1 byte - type tag (33, excluded when known)
- 16 bytes - nontagged POINT serialization
- 16 bytes - nontagged POINT serialization
CIRCLE -
- 1 byte - type tag (32, excluded when known)
- 16 bytes - nontagged POINT serialization
- 8 bytes - nontagged DOUBLE serialization
POLYGON -
- 1 byte - type tag (31, excluded when known)
- 2 bytes - NUMBER of points, as an int16
- remaining bytes - A series of nontagged POINT serializations
STRING
- 1 byte - STRING tag, 13 (excluded when type is known)
- 1-5 bytes - Variable-length encoding of the LENGTH of string.
Each byte stores seven bits of the number. The first bit of each byte notifies if it is the last byte. Specifically, if the first bit is set, then we need to shift the current value by seven and continue to read the next byte util we meet a byte whose first byte is unset.
* e.g. if the number is < 128, it will be stored using one byte and the byte value keeps as original.
* To store the number 255 (0xff) , it will be encoded as [0x81,0x7f]. To decode that value, it reads the 0x81
* to know that the current value is (0x81 & 0x7f)= 0x01, and the first bit tells that there are more bytes to
* be read. When it meets 0x7f, whose first flag is unset, it knows that it is the final byte to decode.
* Finally it will return ( 0x01 << 7) + 0x7f === 255.
LENGTH bytes - Value of the string, UTF-8 encoded
Example: Byte Array: [13, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100]
- 13 (STRING TAG)
- 10 (LENGTH is ten)
- 109, 101, 115, 115, 97, 103, 101, 45, 105, 100 (String value, "message-id")
TIME Serializations:
DATE
- 1 byte - DATE tag, 17 (excluded when type is known)
- 4 bytes - INT32 representation of the number of days since 1970-01-01
- 1 byte - TIME tag, 18 (excluded when type is known)
- 4 bytes - INT32 representation of the number of milliseconds elapsed since the beginning of the day
- 1 byte - DATETIME tag, 16 (excluded when type is known)
- 8 bytes - INT64 representation of the number of milliseconds elapsed since 1970-01-01T00:00:00.000Z (also called chronon time)
- 1 byte - DURATION tag, 19 (excluded when type is known)
- 4 bytes - INT32 representation of the number of months from the year and date fields
- 8 bytes - INT64 representation of the number of milliseconds from all other fields
- 1 byte - INTERVAL tag, 34 (excluded when type is known)
- 1 byte - Tag of the type of interval (TIME, DATETIME, DATE)
- Followed by two type files (for TIME, DATE)
- 4 bytes - INT32 representation of the start
- 4 bytes - INT32 representation of the end
- or (for DATETIME)
- 8 bytes - INT64 representation of the start
- 8 bytes - INT64 representation of the end
ORDEREDLIST (Also applies to UNORDEREDLIST)
- 1 byte - Orderedlist tag, 22 (excluded when type is known)
- 1 byte - Type of items on the list
- 4 bytes - Total number of bytes
- 4 bytes - Number of items
if (type of list items is string, record, or list (length is not constant))
- 4 bytes per item (offsets)
for each item
- Bytes of the item (In the case of a list of ANY, the items will include type tags)
Example: Nested List: [["message-id"]]
Byte Array: [22, 22, 0, 0, 0, 39, 0, 0, 0, 1, 0, 0, 0, 14, 13, 0, 0, 0, 26, 0, 0, 0, 1, 0, 0, 0, 14, 0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100]
22 (list tag)
22 (this is a list of lists)
0, 0, 0, 39 (39 bytes)
0, 0, 0, 1 (1 item)
0, 0, 0, 14 (offset for first item (a list) is 14)
13 (type of inner list is string)0, 0, 0, 26 (size of inner list is 26) 0, 0, 0, 1, (there is one item on inner list) 0, 0, 0, 14, (the offset for this item is 14)0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100 (item is a string of length 10, "message-id")
Record
- 1 byte - Record tag, 24 (excluded when type is known)
- 4 bytes - Total number of bytes
if (recordType is not closed)
- 1 byte - Boolean isExpanded
if (isExpanded)
- 4 bytes - Offset to open part
- 4 bytes - Number of closed fields
if (recordType hasNullableFields)
- ceil (numberOfFields / 8) bytes - Nullbitmap (1 bit per field, "1" means field is Null for this record)
for each closed field for this record
- 4 bytes - Closed field offset
for each closed field that is not Null for this record
- Bytes of the field (type is known from recordtype, so the bytes will not have a type tag)
if (isExpanded)
- 4 bytes - Number of open fields
for each open field, pairs sorted by hashcode ascending
- 4 bytes - Hash code
- 4 bytes - Offset
for each open field
- Bytes of the field name (String, no type tag)
- Bytes of the field (with type tag)
Example
DDL:drop dataverse test if exists;Schema for the record:
create dataverse test;
use dataverse test;
create type FacebookMessageType as closed {
message-id: int32
}
create dataset FacebookMessages(FacebookMessageType)
primary key message-id;
for $index in dataset Metadata.Index
where $index.IndexName = "FacebookMessages"
return $index;
open {
DataverseName: STRING,
DatasetName: STRING,
IndexName: STRING,
IndexStructure: STRING,
SearchKey: [ [ STRING ] ],
IsPrimary: BOOLEAN,
Timestamp: STRING,
PendingOp: INT32
}
Byte Array: [24, 0, 0, 0, -40, 1, 0, 0, 0, -88, 0, 0, 0, 8, 0, 0, 0, 46, 0, 0, 0, 52, 0, 0, 0, 70, 0, 0, 0, 88, 0, 0, 0, 95, 0, 0, 0, -123, 0, 0, 0, -122, 0, 0, 0, -92, 0, 4, 116, 101, 115, 116, 0, 16, 70, 97, 99, 101, 98, 111, 111, 107, 77, 101, 115, 115, 97, 103, 101, 115, 0, 16, 70, 97, 99, 101, 98, 111, 111, 107, 77, 101, 115, 115, 97, 103, 101, 115, 0, 5, 66, 84, 82, 69, 69, 22, 0, 0, 0, 39, 0, 0, 0, 1, 0, 0, 0, 14, 13, 0, 0, 0, 26, 0, 0, 0, 1, 0, 0, 0, 14, 0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100, 1, 0, 28, 84, 117, 101, 32, 79, 99, 116, 32, 48, 55, 32, 49, 48, 58, 50, 50, 58, 49, 54, 32, 80, 68, 84, 32, 50, 48, 49, 52, 0, 0, 0, 1, 0, 0, 0, 1, 77, 124, -113, 81, 0, 0, 0, -76, 0, 13, 83, 101, 97, 114, 99, 104, 75, 101, 121, 84, 121, 112, 101, 22, 29, 0, 0, 0, 21, 0, 0, 0, 1, 0, 0, 0, 14, 13, 0, 4, 110, 117, 108, 108]
24, (Record)
0, 0, 0, -40, (Number of bytes)
1, (IsExpanded === true)
0, 0, 0, -88, (open offset)
0, 0, 0, 8, (8 closed fields)
0, 0, 0, 46, 0, 0, 0, 52, 0, 0, 0, 70, 0, 0, 0, 88, 0, 0, 0, 95, 0, 0, 0, -123, 0, 0, 0, -122, 0, 0, 0, -92, (4 bytes per closed field, offsets)
0, 4, 116, 101, 115, 116, (string "test")
0, 16, 70, 97, 99, 101, 98, 111, 111, 107, 77, 101, 115, 115, 97, 103, 101, 115, (string "FacebookMessages")
0, 16, 70, 97, 99, 101, 98, 111, 111, 107, 77, 101, 115, 115, 97, 103, 101, 115, (string "FacebookMessages")
0, 5, 66, 84, 82, 69, 69, (string "BTREE")
22, 0, 0, 0, 39, 0, 0, 0, 1, 0, 0, 0, 14, (this shows that it is a list of size 1, type is orderedlist)13, 0, 0, 0, 26, 0, 0, 0, 1, 0, 0, 0, 14, (the nested list is a list of size 1, type is string)0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100, (the nested string is "message-id")
1, (Boolean true)
0, 28, 84, 117, 101, 32, 79, 99, 116, 32, 48, 55, 32, 49, 48, 58, 50, 50, 58, 49, 54, 32, 80, 68, 84, 32, 50, 48, 49, 52, (string timestamp)
0, 0, 0, 1, (PendingOP == 1)
0, 0, 0, 1, (1 open Field)
77, 124, -113, 81, (Hash code for the field name)
0, 0, 0, -76, (Offset for the field)
0, 13, 83, 101, 97, 114, 99, 104, 75, 101, 121, 84, 121, 112, 101, (name of open field (length 13), string "FieldTypes")
22, 29, 0, 0, 0, 21, 0, 0, 0, 1, 0, 0, 0, 14, (value of open field, ordered list of size 1, of type ANY) 13, 0, 4, 110, 117, 108, 108 (string "null")
More Complicated Example (Highly Nested and Open record)
DDL:drop dataverse test if exists;Schema for the record:
create dataverse test;
use dataverse test;
create type S as closed{
id: int32,
Species: string
}
create type GS as closed{
id: int32,
Genus: string,
lower: S
}
create type FGS as open{
id: int32,
Family: string
}
create type OFGS as closed{
id: int32,
Order: string,
lower: FGS
}
create dataset Ss(S)
primary key id;
create dataset GSs(GS)
primary key id;
create dataset FGSs(FGS)
primary key id;
create dataset OFGSs(OFGS)
primary key id;
insert into dataset Ss(
{"id":1,"Species":"Gulo"}
);
insert into dataset GSs(
for $S in dataset Ss
where $S.Species = "Gulo"
return {"id":1,"Genus":"Gulo","lower":$S}
);
insert into dataset FGSs(
for $S in dataset GSs
where $S.lower.Species = "Gulo"
return {"id":1,"Family":"Mustelinae","lower":$S}
);
insert into dataset OFGSs(
for $S in dataset FGSs
where $S.lower.lower.Species = "Gulo"
return {"id":1,"Order":"Carnivora","lower":$S}
);
for $test in dataset OFGSs
return $testclosed {
id: INT32,
Order: STRING,
lower: open {
id: INT32,
Family: STRING
}
}
Byte Array: [24, 0, 0, 0, -41, 0, 0, 0, 3, 0, 0, 0, 21, 0, 0, 0, 25, 0, 0, 0, 36, 0, 0, 0, 1, 0, 9, 67, 97, 114, 110, 105, 118, 111, 114, 97, 0, 0, 0, -76, 1, 0, 0, 0, 38, 0, 0, 0, 2, 0, 0, 0, 22, 0, 0, 0, 26, 0, 0, 0, 1, 0, 10, 77, 117, 115, 116, 101, 108, 105, 110, 97, 101, 0, 0, 0, 1, 6, 38, 43, 1, 0, 0, 0, 50, 0, 5, 108, 111, 119, 101, 114, 24, 0, 0, 0, 123, 1, 0, 0, 0, 10, 0, 0, 0, 3, 0, 0, 13, 27, 0, 0, 0, 38, 4, 24, 25, -50, 0, 0, 0, 47, 6, 38, 43, 1, 0, 0, 0, 61, 0, 2, 105, 100, 3, 0, 0, 0, 1, 0, 5, 71, 101, 110, 117, 115, 13, 0, 4, 71, 117, 108, 111, 0, 5, 108, 111, 119, 101, 114, 24, 0, 0, 0, 55, 1, 0, 0, 0, 10, 0, 0, 0, 2, -21, -127, -39, 28, 0, 0, 0, 39, 0, 0, 13, 27, 0, 0, 0, 30, 0, 2, 105, 100, 3, 0, 0, 0, 1, 0, 7, 83, 112, 101, 99, 105, 101, 115, 13, 0, 4, 71, 117, 108, 111]
24, (Record)
0, 0, 0, -41, (size)
0, 0, 0, 3, (3 closed fields)
0, 0, 0, 21, 0, 0, 0, 25, 0, 0, 0, 36, (closed offsets)
0, 0, 0, 1, (id = 1)
0, 9, 67, 97, 114, 110, 105, 118, 111, 114, 97, (order = "Carnivora")
0, 0, 0, -76, (number of bytes for lower)
1, (lower is expanded)
0, 0, 0, 38, (open offset)
0, 0, 0, 2, (two closed fields)
0, 0, 0, 22, 0, 0, 0, 26, (closed offsets)
0, 0, 0, 1, (lower.id = 1)
0, 10, 77, 117, 115, 116, 101, 108, 105, 110, 97, 101, (lower.Family = "Mustelinae")
0, 0, 0, 1, (1 open field)
6, 38, 43, 1, 0, 0, 0, 50, (hash and offset of open field)
0, 5, 108, 111, 119, 101, 114, (name of open field = "lower")
24, (open field is a record)
0, 0, 0, 123, (size of open field)
1, (lower.lower is expanded)
0, 0, 0, 10, (open part offset)
0, 0, 0, 3, (3 open fields)
0, 0, 13, 27, 0, 0, 0, 38, 4, 24, 25, -50, 0, 0, 0, 47, 6, 38, 43, 1, 0, 0, 0, 61, (Hashes and offsets for open fields)
0, 2, 105, 100, (fieldName = "id")
3, (type is int32)
0, 0, 0, 1, (id is 1)
0, 5, 71, 101, 110, 117, 115, (fieldname = "Genus")
13, (type is tring)
0, 4, 71, 117, 108, 111, (fieldValue = "Gulo")
0, 5, 108, 111, 119, 101, 114, (name of lower.lower.lower = "lower")
24, (lower.lower.lower is a record)
0, 0, 0, 55, (size of record)
1, (is expanded)
0, 0, 0, 10, (open offset)
0, 0, 0, 2, (2 open fields)
-21, -127, -39, 28, 0, 0, 0, 39, 0, 0, 13, 27, 0, 0, 0, 30, (open hashes and offsets)
0, 2, 105, 100, (fieldName = "id")
3, (id is an int32)
0, 0, 0, 1, (id = 1)
0, 7, 83, 112, 101, 99, 105, 101, 115, (fieldName = "Species")
13, (lower.lower.lower.type = string)
0, 4, 71, 117, 108, 111 (lower.lower.lower.Species = "Gulo")