Intersect multiple secondary index

This optimization tries to intersect multiple secondary indexes if the select conditions introduce them. Previously, as in the IntroduceSelectAccessMethodRule, we would pick the first index to contribute to the access path when there were multiple indexes available. Due to the lack of statistical information, the first one may not be the best choice. Moreover, even we chose the index of the lowest selectivity, it still may not be the best solution. Because we can further reduce the selectivity by intersecting it with the other secondary indexes. Having intersection into the plan will avoid the worst path. Furthermore, if we have statistical information later we can keep improving the decision by whether to introduce the intersection or not.

Optimization Rule

The logical changes are in the IntroduceSelectAccessMethodRule. After we analyzed the interesting functions and indexes, we pair them up as one to one mapping. (E.g. BTreeAccessMethod -> BTreeIndex On Salary.).

If the primary index appears in the mapping, we will simply use it as the access path. Because usually the primary index lookup is very rare, if it indeed happens, then it should have a very high chance to be a high selectivity path. Plus, as a clustered index, primary index search is the fastest one.

If multiple secondary indexes are selected, then we will let each of them to contribute a secondary index to primary index path. A new interface "createSecondaryToPrimaryPlan()" is added to the IAccessMethod for this purpose. (The implementation is required to be functional. Otherwise, different access methods may introduce conflict states.) Then we use an Intersection logical operator as a join point to intersect the primary keys coming from different secondary index path.

The following example shows we intersect the BTree,RTree and NgramInvertedIndex on primary key before goes to primary index lookup.

drop dataverse test if exists;
create dataverse test;
use dataverse test;

create type tTweet as closed {
id: int32,
location: point,
message: string,
create_at: datetime,
misc: string
}

create dataset dsTweet(tTweet) primary key id;

create index ngram_index on dsTweet(message) type ngram(3);
create index time_index on dsTweet(create_at) type btree;
create index location_index on dsTweet(location) type rtree;

write output to nc1:"rttest/btree-rtree-ngram-intersect.adm";

let $region := create-rectangle(create-point(-128.43007812500002,20.298506037222175), create-point(-64.26992187500002,54.56902589732035))
let $ts_start := datetime("2015-11-11T00:00:00Z")
let $ts_end := datetime("2015-12-18T23:59:59Z")
let $keyword := "hello"
for $t in dataset dsTweet
where $t.create_at >= $ts_start and $t.create_at < $ts_end
and spatial-intersect($t.location, $region)
and contains($t.message, $keyword)
return $t

The corresponding plan is generated as below:

-- DISTRIBUTE_RESULT  |PARTITIONED|
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    -- STREAM_PROJECT  |PARTITIONED|
      -- STREAM_SELECT  |PARTITIONED|
        -- ASSIGN  |PARTITIONED|
          -- STREAM_PROJECT  |PARTITIONED|
            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
              -- BTREE_SEARCH  |PARTITIONED|
                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                  -- INTERSECT  |PARTITIONED|
                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                      -- STABLE_SORT [$$29(ASC)]  |PARTITIONED|
                        -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                          -- STREAM_PROJECT  |PARTITIONED|
                            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                              -- BTREE_SEARCH  |PARTITIONED|
                                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                  -- ASSIGN  |PARTITIONED|
                                    -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                      -- STABLE_SORT [$$31(ASC)]  |PARTITIONED|
                        -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                          -- STREAM_PROJECT  |PARTITIONED|
                            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                              -- LENGTH_PARTITIONED_INVERTED_INDEX_SEARCH  |PARTITIONED|
                                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                  -- ASSIGN  |PARTITIONED|
                                    -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                      -- STABLE_SORT [$$40(ASC)]  |PARTITIONED|
                        -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                          -- STREAM_PROJECT  |PARTITIONED|
                            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                              -- RTREE_SEARCH  |PARTITIONED|
                                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                  -- ASSIGN  |PARTITIONED|
                                    -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

Hyracks Operator Implementation

Physically, each ETS node will introduce a thread. Thus, the intersection operator must synchronize the upstream input threads in order to generate the correct result. In order to have a pipeline operation, the intersection is implemented in a sort-merge manner. Therefore, each input is required to be sorted. The synchronization is handled by the thread of input No.0, which means the thread 0 will call the writer.open/nextFrame/close functions. If we authorize arbitrary threads to push forward, the downstream operator will be confused, especially in synchronizing their locks. The core logical intersection function is as below:

do
1. find the input id of the maximum record
2. for each input i
  1. if record < max keep popping
  2. if record == max matches max. then match++; continue
  3. if > max, break
3. If match == inputArity
  1. output max record
while no input is closed.

If any of the input is fully consumed, the operator is closed.

Experiment Evaluation

We did two group of experiments. The first one used a small dataset to test the intersection performance if everything is cached in memory. The second one used a large dataset to test the scenario of loading most of the work from the disk.

Dataset: Two real Twitter datasets. The smaller one is one million records of size 800M, The larger one is ten million record of size 8.2G.

Machine: 4 Cpu, 4G Memory, One disk.

Asterix Instance: 1CC, 2NC, 1 partition per NC.

The AQL query is as following:

use dataverse twitter; 
let $ts_start := datetime("2015-11-23T17:$min_start:00.000Z") 
let $ts_end := datetime("2015-11-23T17:$min_end:03.000Z") 
let $ms_start := date("2010-$month_start-01") 
let $ms_end := date("2010-$month_end-28") 
let $result := for $t in dataset ds_tweets 
               where $t.user.create_at >= $ms_start and $t.user.create_at < $ms_end 
               and $t.create_at >= $ts_start and $t.create_at < $ts_end 
               and $t.place = "Unite State"  
               return $t
return count($result)

The query selects on the tweet.create_at and the user.create_at.

Each test will run three times:

1st. with BTree index on User.create_at only,
2nd. with BTree index on Tweet.create_at only,
3rd. with both indexes presents, consequently, the intersection is introduced.

The entire result is shared in google sheet

In Memory case:

Each query will run ten times. We record the time by average the last fives. The time unit is Milliseconds.

Table 1. Fix the User.create_at $month_start = 01, $month_end = 02, increasing the Tweets.create_at selectivity

			Scan	UserCreateAtIndex	TweetCreateAt Index	Intersection	SpeedUp
result	month	minutes	Time (Avg last 5)
431	01--02	00-09	929	142	132	52	2.54
928	01--02	00-19	929	143	226	56	2.55
1458	01--02	00-29	934	144	328	73	1.97
1997	01--02	00-39	932	143	427	80	1.79
2504	01--02	00-49	930	142	528	92	1.54
2989	01--02	00-59	933	141	631	109	1.29

Table 2. Fix the Tweet.create_at $min_start = 00, $min_end = 09, increasing the User.create_at selectivity

			Scan	UserCreateAtIndex	TweetCreateAt Index	Intersection	Reduction
result	month	minutes	Time (Avg last 5)
431	01--02	00-09	929	142	132	52	2.54
670	01--03	00-09	932	189	129	48	2.69
929	01--04	00-09	928	240	124	61	2.03
1140	01--05	00-09	931	291	128	69	1.86
1471	01--06	00-09	933	367	126	65	1.94
1859	01--07	00-09	932	449	125	84	1.49
2166	01--08	00-09	931	525	126	87	1.45
2438	01--09	00-09	932	580	127	94	1.35
2682	01--10	00-09	939	648	127	104	1.22
3011	01--11	00-09	934	710	125	110	1.14
3346	01--12	00-09	933	781	127	120	1.06

Table 3. Both Tweet.create_at and User.create_at increasing the selectivity

			Scan	UserCreateAtIndex	TweetCreateAt Index	Intersection	Reduction
result	month	minutes	Time (Avg last 5)
431	01--02	00-09	929	142	132	52	2.54
1429	01--03	00-19	933	190	228	67	2.84
1945	01--04	00-19	933	239	226	67	3.37
3686	01--05	00-29	934	294	322	83	3.54
4816	01--06	00-29	936	370	324	102	3.18
8320	01--07	00-39	931	453	425	123	3.46
12202	01--08	00-49	932	522	529	146	3.58
13791	01--09	00-49	934	582	527	157	3.36
18489	01--10	00-59	937	644	630	191	3.30

We can see that intersection is the best choice under above settings. The total time speed up to the fast single index path is up to 3.5 times. If the selectivities of two indexes vary a lot, then the benefit of intersection may not that much. If the two indexes are of the similar selectivity the intersection can achieve two to three times faster.

On disk case

The test dataset is changing to the 8.2G dataset. In order to flush the cache, we load the same dataset to another ds_copy dataset. Every time when we run the selection, we scan this 8.2G ds_copy once to invalidate the cache pages. Due to the slowness of the on disk case, we warm up the query only once and record the average time of the next three times.

Table 4. Fix User.create_at condition for one month and increase the Tweet.creat_at range.

			Scan	user time Index	tweet time index	intersection	overhead
result	month	hour	Time (Avg last 5)
1166	01--02	00--01	113423	112469	1397	1615	15.60%
1953	01--02	00--02		111843	1628	1915	17.63%
2438	01--02	00--03		112307	1802	2143	18.92%
2809	01--02	00--04		110893	2105	2453	16.53%
3347	01--02	00--05		119531	3184	2503	-21.39%
4381	01--02	00--06		110912	2552	3060	19.91%
5802	01--02	00--07		111488	3126	3935	25.88%
7594	01--02	00--08		111484	4148	4992	20.35%
9846	01--02	00--09		111499	5299	6243	17.81%

Table 5.Fix Tweet.create_at condition for one hour and increase the User.creat_at range.

			Scan	user time Index	tweet time index	intersection	overhead
result	month	hour	Time (Avg last 5)
1166	01--02	00--01	113423	112469	1397	1615	15.60%
1751	01--03	00--01		110207	1375	1808	31.49%
2262	01--04	00--01		110965	1462	1738	18.88%
2850	01--05	00--01		112291	1371	1835	33.84%
3753	01--06	00--01		111587	1289	1890	46.63%
4679	01--07	00--01		111769	1320	2029	53.71%
5544	01--08	00--01		112209	1340	2076	54.93%
6250	01--09	00--01		113029	1370	2270	65.69%
6955	01--10	00--01		112535	1310	2329	77.79%

Though the two access methods have very different execution time, the intersection tends to catch with the fastest one. The overhead of intersection compares to the fastest path is from 15% to 78%. While its speedup compares to the slowest one is about 5~10 times faster.

Why the Tweet.creat_at access path is so fast?

The answer is that the order of primary key (Tweet.id) is consistent with the order of Tweet.create_at. We speculate that the Tweet.id was generated by the Tweet.create_at. Thus, this secondary index is actually clustered as the primary index. As the consequence, the IO time to fetch each record is clustered. The general performance of the secondary index lookup should be as slow as the User.create_at access path.

Why the intersection is slower than the Tweet.create_at index access path?

Because we only have one disk. First, the Tweet.create_at path has to wait for the User.create_at to finish a frame to operate the intersection. These two index search is battling the disk read. Second, although the intersection itself can be finished as long as one of the input is done, we can not stop the other index scan based on our push model. Hence, the primary search is also competing on the disk resource with the two index searches.

Intersect Unclustered Secondary Indexes

As shown in the previous result, the index on Tweet.create_at is a clustered secondary index, which is a special case for the secondary index. To test a more general case, we create an RTree on the Tweet.place.boudingbox which is a rectangle area. We create a circle area around LA county. By increasing the radius, we can increase the selectivity of that RTree. The query is as below

use dataverse twitter; 
let $ms_start := date("2010-$month_start-01") 
let $ms_end := date("2010-$month_end-28") 
let $region := create-circle(create-point(-118.125,33.939), $radius)
let $result := for $t in dataset ds_tweets 
               where $t.user.create_at >= $ms_start and $t.user.create_at < $ms_end 
               and spatial-intersect($t.place.bounding_box, $region)
               and $t.place = "Unite State" 
               return $t
return count($result)

Table 6. Fix User.create_at condition for one month and increase the $radius range.

			Scan	user time Index	Rtree Index	intersection	speedup
result	month	radius	Time (Avg last 5)
1390	01--02	0.01		111087	106159	9293	11.4235446
1551	01--02	0.02		111306	107127	10012	10.69986017
1575	01--02	0.03		112024	108143	10278	10.52179412
6171	01--02	0.04			111264	31850	3.493375196
6193	01--02	0.05			112916	32001	3.528514734
6689	01--02	0.06			111673	33952	3.289143497
6900	01--02	0.07			111012	34946	3.176672581
6900	01--02	0.08			111570	34937	3.193462518

The experiment is slow. Stay tuned.

Review Patch:

https://asterix-gerrit.ics.uci.edu/#/c/577 and https://asterix-gerrit.ics.uci.edu/#/c/578

Page tree