Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

draw.io Diagram
bordertrue
diagramNameSchema_object
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth221212
revision13

We can see that the root has two fields: id and name and their types are Integer and Object, respectively. The object name has two fields first and last, where both fields are of type String. Below each scalar value in the same, there's a number which represent a ColumnIndex. The following tables show the column representation of those scalar values.

Definition LevelValue
10
11
12

id column (ColumnIndex = 0) – max definition level = 1


Definition LevelValue
0MISSING
2Ann
2John

name.first column (ColumnIndex = 1) – max definition level = 2


Definition LevelValue
0MISSING
1MISSING
2Smith

name.last column (ColumnIndex = 2) – max definition level = 2

Each column stores the values of one "attribute" or a field value (e.g., name.first) for the three records. the Definition Level here shows us whether a record has NULL, MISSING or a value for a specific field. NULL and MISSING could occur in different nesting levels. The definition level value tells us which of the nested values were present (i.e., not NULL or MISSING). The maximum definition level is the number of levels in a path. For example, the path for the field name.first is root(0) → name (1) → first (2). Hence, the maximum definition level is 2. When the definition level of a value equals to the maximum definition level, we the scalar value is present and it is not NULL or MISSING. To illustrate, let us see the values for the field name.last. We can see that the definition levels where 0, 1, and 2 for the three records respectively. The definition level 0 means that the object name is MISSING in the first record (see the JSON records above), the definition level 1 means the object name was present but not the field last, and the last definition level 2 means the string value name.last is present and the value is "Smith". 

Arrays and Multisets

Arrays and multisets can have what the Dremel papers calls repeated values. That means an attribute in a record could have more than one value. Therefore, we need to determine the start and the end of each repeated values. Arrays and multisets in ADM and JSON could be nested (e.g., arrays of arrays/ arrays of objects where each field can have nested arrays). It is important to determine which value belongs to which record, and it is order (or index). Let's see an example:

Code Block
titleJSON Example
{"id": 0, "numbers": [1, 2]}
{"id": 1, "numbers": [4, 5]}
{"id": 2}
{"id": 3,"numbers": [6, 7, 8, 9]}

the schema, 

draw.io Diagram
bordertrue
diagramNameArray1
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth186
revision3

and numbers' column values

Definition LevelValue
21
22
0<DELIMITER>
24
25
0<DELIMITER>
0MISSING
26
27
28
29
0<DELIMITER>

numbers column (ColumnIndex = 1) – max definition level = 2 – max-delimiter = 0

In this example, we only have one columns as there is only one scalar value. However, the scalar value belongs to an array. Thus, multiple values could belong to a single record (e.g., the first record has three values). The maximum definition level is 2 root (0) → numbers (1) → int (2). We also have max-delimiter which is 0. We see that the first three values are delimited by a "<DELIMITER>", which has a definition level 0 (or our max-delimiter value). The delimiter role is to determine the end of a repeated value. Thus, we know that the first three values [1, 2] belong to the first record. The next two \[4, 5]belong the second record. However, after the second delimiter, we see that we have a missing value with a definition level 0. A delimiter can only occur only if the array is present (i.e., there's a value with a definition level > 0). Hence, we know that the definition level 0 for the value that proceed the second delimiter is because the array itself is MISSING. 


Let's take another (more interesting) example

Code Block
titleJSON Example
{"id": 0, "numbers": [[1, 2], [4, 5, 6]]}
{"id": 0, "numbers": [[7, 8]]}
{"id": 0, "numbers": [[10]]}

draw.io Diagram
bordertrue
diagramNameArray2
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth171
revision1

Definition LevelValue
31
32
1<DELIMITER>
34
35
36
1<DELIMITER>
0<DELIMITER>
37
38
1<DELIMITER>
0<DELIMITER>
310
1<DELIMITER>
0<DELIMITER>

numbers column (ColumnIndex = 1) – max definition level = 3 – max-delimiter = 1

In this example, we have two nested arrays. Thus, the max delimiter is 1, which tells us that the definition levels 0 and 1 can be as delimiters. So, after [1, 2]  we see that we have a delimiter with a definition level 1, which indicates the end of the inner array. Then, we have [4, 5 ,6] which also delimited by the definition level 1 as it is the second inner array. After the second delimiter, we have another delimiter with a definition level 0, which tells us that the outer array is finished at this point and the following value belong to the second record. Following this, we see that each array (inner or outer) is delimited once. The following values also have the same pattern (i.e., values followed by delimiters)