Page History

...

draw.io Diagram

border	true

diagramName	Schema_object
simpleViewer	false
width
links	auto
tbstyle	top
lbox	true
diagramWidth	221212
revision	13

We can see that the root has two fields: id and name and their types are Integer and Object, respectively. The object name has two fields first and last, where both fields are of type String. Below each scalar value in the same, there's a number which represent a ColumnIndex. The following tables show the column representation of those scalar values.

Definition Level	Value
1	0
1	1
1	2

id column (ColumnIndex = 0) – max definition level = 1

Definition Level	Value
0	MISSING
2	Ann
2	John

name.first column (ColumnIndex = 1) – max definition level = 2

Definition Level	Value
0	MISSING
1	MISSING
2	Smith

name.last column (ColumnIndex = 2) – max definition level = 2

Each column stores the values of one "attribute" or a field value (e.g., name.first) for the three records. the Definition Level here shows us whether a record has NULL, MISSING or a value for a specific field. NULL and MISSING could occur in different nesting levels. The definition level value tells us which of the nested values were present (i.e., not NULL or MISSING). The maximum definition level is the number of levels in a path. For example, the path for the field name.first is root(0) → name (1) → first (2). Hence, the maximum definition level is 2. When the definition level of a value equals to the maximum definition level, we the scalar value is present and it is not NULL or MISSING. To illustrate, let us see the values for the field name.last. We can see that the definition levels where 0, 1, and 2 for the three records respectively. The definition level 0 means that the object name is MISSING in the first record (see the JSON records above), the definition level 1 means the object name was present but not the field last, and the last definition level 2 means the string value name.last is present and the value is "Smith".

Arrays and Multisets

Arrays and multisets can have what the Dremel papers calls repeated values. That means an attribute in a record could have more than one value. Therefore, we need to determine the start and the end of each repeated values. Arrays and multisets in ADM and JSON could be nested (e.g., arrays of arrays/ arrays of objects where each field can have nested arrays). It is important to determine which value belongs to which record, and it is order (or index). Let's see an example:

Code Block

title	JSON Example

{"id": 0, "numbers": [1, 2]}
{"id": 1, "numbers": [4, 5]}
{"id": 2}
{"id": 3,"numbers": [6, 7, 8, 9]}

the schema,

draw.io Diagram

border	true

diagramName	Array1
simpleViewer	false
width
links	auto
tbstyle	top
lbox	true
diagramWidth	186
revision	3

and numbers' column values

Definition Level	Value
2	1
2	2
0	<DELIMITER>
2	4
2	5
0	<DELIMITER>
0	MISSING
2	6
2	7
2	8
2	9
0	<DELIMITER>

numbers column (ColumnIndex = 1) – max definition level = 2 – max-delimiter = 0

In this example, we only have one columns as there is only one scalar value. However, the scalar value belongs to an array. Thus, multiple values could belong to a single record (e.g., the first record has three values). The maximum definition level is 2 root (0) → numbers (1) → int (2). We also have max-delimiter which is 0. We see that the first three values are delimited by a "<DELIMITER>", which has a definition level 0 (or our max-delimiter value). The delimiter role is to determine the end of a repeated value. Thus, we know that the first three values [1, 2] belong to the first record. The next two \[4, 5]belong the second record. However, after the second delimiter, we see that we have a missing value with a definition level 0. A delimiter can only occur only if the array is present (i.e., there's a value with a definition level > 0). Hence, we know that the definition level 0 for the value that proceed the second delimiter is because the array itself is MISSING.

Let's take another (more interesting) example

Code Block

title	JSON Example

{"id": 0, "numbers": [[1, 2], [4, 5, 6]]}
{"id": 0, "numbers": [[7, 8]]}
{"id": 0, "numbers": [[10]]}

draw.io Diagram

border	true

diagramName	Array2
simpleViewer	false
width
links	auto
tbstyle	top
lbox	true
diagramWidth	171
revision	1

Definition Level	Value
3	1
3	2
1	<DELIMITER>
3	4
3	5
3	6
1	<DELIMITER>
0	<DELIMITER>
3	7
3	8
1	<DELIMITER>
0	<DELIMITER>
3	10
1	<DELIMITER>
0	<DELIMITER>

numbers column (ColumnIndex = 1) – max definition level = 3 – max-delimiter = 1

In this example, we have two nested arrays. Thus, the max delimiter is 1, which tells us that the definition levels 0 and 1 can be as delimiters. So, after [1, 2] we see that we have a delimiter with a definition level 1, which indicates the end of the inner array. Then, we have [4, 5 ,6] which also delimited by the definition level 1 as it is the second inner array. After the second delimiter, we have another delimiter with a definition level 0, which tells us that the outer array is finished at this point and the following value belong to the second record. Following this, we see that each array (inner or outer) is delimited once. The following values also have the same pattern (i.e., values followed by delimiters)

Page tree

Versions Compared

Old Version 2

New Version 3

Key

Arrays and Multisets