Proposers

@<proposer1 JIRA username>
@<proposer2 JIRA username>
...

Approvers

@<approver1 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
@<approver2 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
...

Status

Current state:

	Current State
UNDER DISCUSSION
IN PROGRESS
ABANDONED
COMPLETED
INACTIVE

Discussion thread: here

JIRA: here

Released: <Hudi Version>

Abstract

Z-order is a technique that allows you to map multidimensional data to a single dimension. We can use this feature to improve query performance.

Background

Z-order is not currently supported by open source engines and open source data Lake components. It makes sense to introduce this feature to Hudi

Introducing Z-order

Z-order is a technique that allows you to map multidimensional data to a single dimension.

Refer to Wik: imagine that you have a collection of (X, Y) coordinate pairs laid out on a 2-dimensional plane. Using Z-ordering, you could arrange those 2D pairs on a 1-dimensional line. Importantly, values that were close together in the 2D plane would still be close to each other on the line. The figure below shows the Z-values for the two dimensional case with integer coordinates 0 ≤ x ≤ 7, 0 ≤ y ≤ 7 (shown both in decimal and binary). Interleaving the binary coordinate values yields binary z-values as shown. Connecting the z-values in their numerical order produces the recursively Z-shaped curve. Two-dimensional Z-values are also called as quadkey ones

It can be seen that if we sort the data according to the order of z-values and divide it into four files on average, no matter we use X or Y field filtering for point query in the query, we can skip half of the irrelevant files. If the amount of data is larger, the effect will be better. That is to say, the file based on z-order partition storage, It can have better data skipping effect on multiple fields. Fortunately, Z-order is not limited to 2-dimensional space—it can be abstracted to work in any number of dimensions.

Implementation

At a high level there are 3 components to implement z-order support: z-value generation, statistical info preservation, query engine integration. Next 3 subsections discuss this in detail.

Z-value generation

The core mechanism underlying Z-ordering is the ability to translate a tuple of several field values into a single numbe (call it z-value). Refer to Wik, we simply interleave the bits of each field in the record

For example, imagine that we’re trying to calculate the Z-address of two fields value which are unsigned (X=97, Y=214), first let's represent x,y as bits

X value: 01100001
Y value: 11010110

second, Starting with the leftmost bit of Y, we interleave the bits of X and Y, then we can get z-value as follow

Z value: 1011011000101001

As mentioned above, for multiple unsigned int incremental value, Z-value is generated by crossing and merging bits. Since the bits of each dimension value cross in the final Z-value, the sorting based on Z-value naturally forms a z-order curve which has good aggregation effect for X and Y fields. However, in the actual use scenario, the following problems need to be solved to generate Z-value based on this rule

1.There are many kinds of actual data types. How to deal with the data of non signed int type.

answer: we provide two ways to solve this problems

According to different types of data, find out differnt conversion method to convert those datas to bits. The conversion method must ensure the same lexicographically order before and after the conversion\

unsigned int converter:

just convert it to bits, the convert result has same lexicographically order before and after the conversion

Int converter：

we cannot convert it to bits directly, since in java the first bit of a negative number represents a sign。

Decimal value	Two’s complement signed integer
0	0000 0000
1	0000 0001
2	0000 0010
126	0111 1110
127	0111 1111
-128	1000 0000
-127	1000 0001
-126	1000 0010
-2	1111 1110
-1	1111 1111

Float/Double/Long/ converter:

Decimal converter:

Date converter:

TimeStamp converter:

String converter:

Boundary-based Interleaved Index

2.The bits number of each dimension value are handled differently, such as how a short type and an int type cross merge bits. How to deal with those values

answer: just align all fields value to 64 bits

3.z-value is produced by interleaving bits of each diemesion, If the z-order value bits exceed 64 (that is, a bit of long type), how to store and express the Z-value values in hudi and compare them.

answer: Using Array[Byte] to stored the z-value (we limit the length of this array to 1024, just like Amazon DynamoDB); notice that now hbase rowkey is sorted by Array[Byte] internal we can copy those code directly as compartor for z-value.

Rollout/Adoption Plan

<What impact (if any) will there be on existing users?>
<If we are changing behavior how will we phase out the older behavior?>
<If we need special migration tools, describe them here.>
<When will we remove the existing behavior?>

Test Plan

<Describe in few sentences how the RFC will be tested. How will we know that the implementation works as expected? How will we know nothing broke?>

Space shortcuts

Page tree

RFC - 28 : Support Z-order curve

Proposers

Approvers

Status

Abstract

Background

Introducing Z-order

Implementation

Z-value generation

Rollout/Adoption Plan

Test Plan

Space shortcuts

Page tree

复制从 RFC - <RFC NUMBER>

RFC - 28 : Support Z-order curve

Proposers

Approvers

Status

Abstract

Background

Introducing Z-order

Implementation

Z-value generation

Rollout/Adoption Plan

Test Plan