Type Tag Reference

Please refer to the following reference for the 1 byte type tags of various objects. It is important to note that when the type of the object is known (For example, when it is part of the closed schema for a Record) the type tag will be omitted from the serialization of the object.

  • 1 INT8
  • 2 INT16
  • 3 INT32
  • 4 INT64
  • 11 FLOAT
  • 12 DOUBLE
  • 13 STRING
  • 14 NULL
  • 15 BOOLEAN
  • 16 DATETIME
  • 17 DATE
  • 18 TIME
  • 19 DURATION
  • 20 POINT
  • 22 ORDEREDLIST
  • 23 UNORDEREDLIST
  • 24 RECORD
  • 29 ANY
  • 30 LINE
  • 31 POLYGON
  • 32 CIRCLE
  • 33 RECTANGLE
  • 34 INTERVAL


BOOLEAN

  • 1 byte - BOOLEAN tag, 15 (excluded when type is known)
  • 1 byte - Value

Example: Byte Array: [15, 1]

  • 15, - (BOOLEAN tag)
  • 1 - (value is TRUE)


INT8

  • 1 byte - Int8 tag, 1 (excluded when type is known)
  • 1 byte - Value

Example: Byte Array: [1, 4]

  • 1, (Int8 tag)
  • 4 (value is 4)


INT16

  • 1 byte - Int16 tag, 2 (excluded when type is known)
  • 2 bytes - Value

Example: Byte Array: [2, 0, 8]

  • 2, (Int16 tag)
  • 0, 8 (value is 8)


INT32

  • 1 byte - Int32 tag, 3 (excluded when type is known)
  • 4 bytes - Value

Example: Byte Array: [3, 0, 0, 0, 23]

  • 3, (Int32 tag)
  • 0, 0, 0, 23 (value is 23)


INT64

  • 1 byte - Int64 tag, 4 (excluded when type is known)
  • 8 bytes - Value

Example: Byte Array: [3, 0, 0, 0, 0, 0, 0, 0, 42]

  • 4, (Int64 tag)
  • 0, 0, 0, 0, 0, 0, 0, 42 (value is 42)


FLOAT

  • 1 byte - FLOAT tag, 11 (excluded when type is known)
  • 4 bytes - Value, using IEEE 754 floating-point "single format" bit layout.


DOUBLE

  • 1 byte - DOUBLE tag, 12 (excluded when type is known)
  • 8 bytes - Value, using IEEE 754 floating-point "double format" bit layout.


Location Serializations

POINT -

  • 1 byte - type tag (20, excluded when known)
  • 8 bytes - nontagged DOUBLE serialization
  • 8 bytes -nontagged DOUBLE serialization

LINE -

  • 1 byte - type tag (30, excluded when known)
  • 16 bytes - nontagged POINT serialization
  • 16 bytes - nontagged POINT serialization

RECTANGLE -

  • 1 byte - type tag (33, excluded when known)
  • 16 bytes - nontagged POINT serialization
  • 16 bytes - nontagged POINT serialization

CIRCLE -

  • 1 byte - type tag (32, excluded when known)
  • 16 bytes - nontagged POINT serialization
  • 8 bytes - nontagged DOUBLE serialization

POLYGON -

  • 1 byte - type tag (31, excluded when known)
  • 2 bytes - NUMBER of points, as an int16
  • remaining bytes - A series of nontagged POINT serializations


STRING

  • 1 byte - STRING tag, 13 (excluded when type is known)
  • 1-5 bytes - Variable-length encoding of the LENGTH of string. 
    • Each byte stores seven bits of the number. The first bit of each byte notifies if it is the last byte. Specifically, if the first bit is set, then we need to shift the current value by seven and continue to read the next byte util we meet a byte whose first byte is unset.

      * e.g. if the number is < 128, it will be stored using one byte and the byte value keeps as original.
      * To store the number 255 (0xff) , it will be encoded as [0x81,0x7f]. To decode that value, it reads the 0x81
      * to know that the current value is (0x81 & 0x7f)= 0x01, and the first bit tells that there are more bytes to
      * be read. When it meets 0x7f, whose first flag is unset, it knows that it is the final byte to decode.
      * Finally it will return ( 0x01 << 7) + 0x7f === 255.

  • LENGTH bytes - Value of the string, UTF-8 encoded

Example: Byte Array: [13, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100]

  • 13 (STRING TAG)
  • 10 (LENGTH is ten)
  • 109, 101, 115, 115, 97, 103, 101, 45, 105, 100 (String value, "message-id")


TIME Serializations:

DATE

  • 1 byte - DATE tag, 17 (excluded when type is known)
  • 4 bytes - INT32 representation of the number of days since 1970-01-01
TIME 
  • 1 byte - TIME tag, 18 (excluded when type is known)
  • 4 bytes - INT32 representation of the number of milliseconds elapsed since the beginning of the day
DATETIME 
  • 1 byte - DATETIME tag, 16 (excluded when type is known)
  • 8 bytes - INT64 representation of the number of milliseconds elapsed since 1970-01-01T00:00:00.000Z (also called chronon time)
DURATION 
  • 1 byte - DURATION tag, 19 (excluded when type is known)
  • 4 bytes - INT32 representation of the number of months from the year and date fields
  • 8 bytes - INT64 representation of the number of milliseconds from all other fields
INTERVAL 
  • 1 byte - INTERVAL tag, 34 (excluded when type is known)
  • 1 byte - Tag of the type of interval (TIMEDATETIMEDATE)
  • Followed by two type files (for TIME, DATE)
    • 4 bytes - INT32 representation of the start
    • 4 bytes - INT32 representation of the end
  • or (for DATETIME)
    • 8 bytes - INT64 representation of the start
    • 8 bytes - INT64 representation of the end


ORDEREDLIST (Also applies to UNORDEREDLIST)

  • 1 byte - Orderedlist tag, 22 (excluded when type is known)
  • 1 byte - Type of items on the list
  • 4 bytes - Total number of bytes
  • 4 bytes - Number of items
if (type of list items is string, record, or list (length is not constant))
  • 4 bytes per item (offsets)
for each item
  • Bytes of the item (In the case of a list of ANY, the items will include type tags)

Example: Nested List: [["message-id"]]

Byte Array: [22, 22, 0, 0, 0, 39, 0, 0, 0, 1, 0, 0, 0, 14, 13, 0, 0, 0, 26, 0, 0, 0, 1, 0, 0, 0, 14, 0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100]

22 (list tag)
22 (this is a list of lists)
0, 0, 0, 39 (39 bytes)
0, 0, 0, 1 (1 item)
0, 0, 0, 14 (offset for first item (a list) is 14)
13 (type of inner list is string)
0, 0, 0, 26 (size of inner list is 26) 0, 0, 0, 1, (there is one item on inner list) 0, 0, 0, 14, (the offset for this item is 14)
0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100 (item is a string of length 10, "message-id")


Record

  • 1 byte - Record tag, 24 (excluded when type is known)
  • 4 bytes - Total number of bytes
if (recordType is not closed)
  • 1 byte - Boolean isExpanded
if (isExpanded)
  • 4 bytes - Offset to open part
  • 4 bytes - Number of closed fields
if (recordType hasNullableFields)
  • ceil (numberOfFields / 8) bytes - Nullbitmap (1 bit per field, "1" means field is Null for this record)
for each closed field for this record
  • 4 bytes - Closed field offset
for each closed field that is not Null for this record
  • Bytes of the field (type is known from recordtype, so the bytes will not have a type tag)
if (isExpanded)
  • 4 bytes - Number of open fields
for each open field, pairs sorted by hashcode ascending
  • 4 bytes - Hash code
  • 4 bytes - Offset
for each open field
  • Bytes of the field name (String, no type tag)
  • Bytes of the field (with type tag)

Example

DDL:
                drop dataverse test if exists;
                create dataverse test
;

               
use dataverse test;

                create type
FacebookMessageType as closed {
                        message
-id: int32

               
}

                create dataset
FacebookMessages(FacebookMessageType)

                primary key message
-id;

               
for $index in dataset Metadata.Index

               
where $index.IndexName = "FacebookMessages"

               
return $index;
Schema for the record:
                 open {
                 
DataverseName: STRING,
                 
DatasetName: STRING,
                 
IndexName: STRING,
                 
IndexStructure: STRING,
                 
SearchKey: [ [ STRING ] ],
                 
IsPrimary: BOOLEAN,
                 
Timestamp: STRING,
                 
PendingOp: INT32
               
}

Byte Array: [24, 0, 0, 0, -40, 1, 0, 0, 0, -88, 0, 0, 0, 8, 0, 0, 0, 46, 0, 0, 0, 52, 0, 0, 0, 70, 0, 0, 0, 88, 0, 0, 0, 95, 0, 0, 0, -123, 0, 0, 0, -122, 0, 0, 0, -92, 0, 4, 116, 101, 115, 116, 0, 16, 70, 97, 99, 101, 98, 111, 111, 107, 77, 101, 115, 115, 97, 103, 101, 115, 0, 16, 70, 97, 99, 101, 98, 111, 111, 107, 77, 101, 115, 115, 97, 103, 101, 115, 0, 5, 66, 84, 82, 69, 69, 22, 0, 0, 0, 39, 0, 0, 0, 1, 0, 0, 0, 14, 13, 0, 0, 0, 26, 0, 0, 0, 1, 0, 0, 0, 14, 0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100, 1, 0, 28, 84, 117, 101, 32, 79, 99, 116, 32, 48, 55, 32, 49, 48, 58, 50, 50, 58, 49, 54, 32, 80, 68, 84, 32, 50, 48, 49, 52, 0, 0, 0, 1, 0, 0, 0, 1, 77, 124, -113, 81, 0, 0, 0, -76, 0, 13, 83, 101, 97, 114, 99, 104, 75, 101, 121, 84, 121, 112, 101, 22, 29, 0, 0, 0, 21, 0, 0, 0, 1, 0, 0, 0, 14, 13, 0, 4, 110, 117, 108, 108]

24, (Record)

0, 0, 0, -40, (Number of bytes)

1, (IsExpanded === true)

0, 0, 0, -88, (open offset)

0, 0, 0, 8, (8 closed fields)

0, 0, 0, 46, 0, 0, 0, 52, 0, 0, 0, 70, 0, 0, 0, 88, 0, 0, 0, 95, 0, 0, 0, -123, 0, 0, 0, -122, 0, 0, 0, -92, (4 bytes per closed field, offsets)

0, 4, 116, 101, 115, 116, (string "test")


0, 16, 70, 97, 99, 101, 98, 111, 111, 107, 77, 101, 115, 115, 97, 103, 101, 115, (string "FacebookMessages")
0, 16, 70, 97, 99, 101, 98, 111, 111, 107, 77, 101, 115, 115, 97, 103, 101, 115, (string "FacebookMessages")
0, 5, 66, 84, 82, 69, 69, (string "BTREE")
22, 0, 0, 0, 39, 0, 0, 0, 1, 0, 0, 0, 14, (this shows that it is a list of size 1, type is orderedlist)
13, 0, 0, 0, 26, 0, 0, 0, 1, 0, 0, 0, 14, (the nested list is a list of size 1, type is string)
0, 10, 109, 101, 115, 115, 97, 103, 101, 45, 105, 100, (the nested string is "message-id")
1, (Boolean true)
0, 28, 84, 117, 101, 32, 79, 99, 116, 32, 48, 55, 32, 49, 48, 58, 50, 50, 58, 49, 54, 32, 80, 68, 84, 32, 50, 48, 49, 52, (string timestamp)
0, 0, 0, 1, (PendingOP == 1)
0, 0, 0, 1, (1 open Field)
77, 124, -113, 81, (Hash code for the field name)
0, 0, 0, -76, (Offset for the field)
0, 13, 83, 101, 97, 114, 99, 104, 75, 101, 121, 84, 121, 112, 101, (name of open field (length 13), string "FieldTypes")
22, 29, 0, 0, 0, 21, 0, 0, 0, 1, 0, 0, 0, 14, (value of open field, ordered list of size 1, of type ANY) 13, 0, 4, 110, 117, 108, 108 (string "null")

More Complicated Example (Highly Nested and Open record)

DDL:
                drop dataverse test if exists;
                create dataverse test
;

               
use dataverse test;

                create type S
as closed{
                        id
: int32,
                       
Species: string
               
}
                create type GS
as closed{
                        id
: int32,
                       
Genus: string,
                        lower
: S
               
}
                create type FGS
as open{
                        id
: int32,
                       
Family: string
               
}
                create type OFGS
as closed{
                        id
: int32,
                       
Order: string,
                        lower
: FGS
               
}

                create dataset
Ss(S)
                primary key id
;
                create dataset
GSs(GS)
                primary key id
;
                create dataset
FGSs(FGS)
                primary key id
;
                create dataset
OFGSs(OFGS)
                primary key id
;


                insert
into dataset Ss(
                       
{"id":1,"Species":"Gulo"}
               
);
                insert
into dataset GSs(
                       
for $S in dataset Ss
                       
where $S.Species = "Gulo"
                       
return {"id":1,"Genus":"Gulo","lower":$S}
               
);
                insert
into dataset FGSs(
                       
for $S in dataset GSs
                       
where $S.lower.Species = "Gulo"
                       
return {"id":1,"Family":"Mustelinae","lower":$S}
               
);
                insert
into dataset OFGSs(
                       
for $S in dataset FGSs
                       
where $S.lower.lower.Species = "Gulo"
                       
return {"id":1,"Order":"Carnivora","lower":$S}
               
);

               
for $test in dataset OFGSs
               
return $test
Schema for the record:
                 closed {
                        id
: INT32,
                       
Order: STRING,
                        lower
: open {
                                id
: INT32,
                               
Family: STRING
                       
}
               
}

Byte Array: [24, 0, 0, 0, -41, 0, 0, 0, 3, 0, 0, 0, 21, 0, 0, 0, 25, 0, 0, 0, 36, 0, 0, 0, 1, 0, 9, 67, 97, 114, 110, 105, 118, 111, 114, 97, 0, 0, 0, -76, 1, 0, 0, 0, 38, 0, 0, 0, 2, 0, 0, 0, 22, 0, 0, 0, 26, 0, 0, 0, 1, 0, 10, 77, 117, 115, 116, 101, 108, 105, 110, 97, 101, 0, 0, 0, 1, 6, 38, 43, 1, 0, 0, 0, 50, 0, 5, 108, 111, 119, 101, 114, 24, 0, 0, 0, 123, 1, 0, 0, 0, 10, 0, 0, 0, 3, 0, 0, 13, 27, 0, 0, 0, 38, 4, 24, 25, -50, 0, 0, 0, 47, 6, 38, 43, 1, 0, 0, 0, 61, 0, 2, 105, 100, 3, 0, 0, 0, 1, 0, 5, 71, 101, 110, 117, 115, 13, 0, 4, 71, 117, 108, 111, 0, 5, 108, 111, 119, 101, 114, 24, 0, 0, 0, 55, 1, 0, 0, 0, 10, 0, 0, 0, 2, -21, -127, -39, 28, 0, 0, 0, 39, 0, 0, 13, 27, 0, 0, 0, 30, 0, 2, 105, 100, 3, 0, 0, 0, 1, 0, 7, 83, 112, 101, 99, 105, 101, 115, 13, 0, 4, 71, 117, 108, 111]

24, (Record)

0, 0, 0, -41, (size)

0, 0, 0, 3, (3 closed fields)

0, 0, 0, 21, 0, 0, 0, 25, 0, 0, 0, 36, (closed offsets)


0, 0, 0, 1, (id = 1)


0, 9, 67, 97, 114, 110, 105, 118, 111, 114, 97, (order = "Carnivora")
0, 0, 0, -76, (number of bytes for lower)
1, (lower is expanded)
0, 0, 0, 38, (open offset)
0, 0, 0, 2, (two closed fields)
0, 0, 0, 22, 0, 0, 0, 26, (closed offsets)


0, 0, 0, 1, (lower.id = 1)


0, 10, 77, 117, 115, 116, 101, 108, 105, 110, 97, 101, (lower.Family = "Mustelinae")
0, 0, 0, 1, (1 open field)
6, 38, 43, 1, 0, 0, 0, 50, (hash and offset of open field)
0, 5, 108, 111, 119, 101, 114, (name of open field = "lower")
24, (open field is a record)
0, 0, 0, 123, (size of open field)
1, (lower.lower is expanded)
0, 0, 0, 10, (open part offset)
0, 0, 0, 3, (3 open fields)
0, 0, 13, 27, 0, 0, 0, 38, 4, 24, 25, -50, 0, 0, 0, 47, 6, 38, 43, 1, 0, 0, 0, 61, (Hashes and offsets for open fields)


0, 2, 105, 100, (fieldName = "id")


3, (type is int32)
0, 0, 0, 1, (id is 1)
0, 5, 71, 101, 110, 117, 115, (fieldname = "Genus")
13, (type is tring)
0, 4, 71, 117, 108, 111, (fieldValue = "Gulo")
0, 5, 108, 111, 119, 101, 114, (name of lower.lower.lower = "lower")
24, (lower.lower.lower is a record)
0, 0, 0, 55, (size of record)
1, (is expanded)
0, 0, 0, 10, (open offset)
0, 0, 0, 2, (2 open fields)
-21, -127, -39, 28, 0, 0, 0, 39, 0, 0, 13, 27, 0, 0, 0, 30, (open hashes and offsets)


0, 2, 105, 100, (fieldName = "id")


3, (id is an int32)
0, 0, 0, 1, (id = 1)
0, 7, 83, 112, 101, 99, 105, 101, 115, (fieldName = "Species")
13, (lower.lower.lower.type = string)
0, 4, 71, 117, 108, 111 (lower.lower.lower.Species = "Gulo")
  • No labels