Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • 1st. with BTree index on Tweet.create_at only, 
  • 2nd. with BTree index on User.create_at only, 
  • 3rd. with both indexes presents, consequently, the intersection is introduced.

The entire result is shared in google sheet

In Memory case:

Each query will run ten times. We record the time by average the last fives. 

...

We can see that intersection is the best one under current settings. The total time reduction is from 5% to 70%. If the two indexes are vary a lot in the selectivity, then the benefit of intersection may not that much. If the two indexes are of the samilar similar selectivity the intersection can achieve 60% ~ 70% total time reduction.

...

The test dataset is changing to the 8.2G dataset. In order to flush the cache, we load the same dataset to another ds_copy dataset. Every time when we run the selection, we scan this 8.2G ds_copy once to invalidate the cache pages. Due to the slowness of the on disk case, we warm up the query only once and record the average time of the next three times.

Table 4. Fix User.create_at condition for one month and increase the Tweet.creat_at range. 

   Scanuser time Indextweet time indexintersectionoverhead
resultmonthhourTime (Avg last 5)    
116601--0200--011134231124691397161515.60%
195301--0200--02 1118431628191517.63%
243801--0200--03 1123071802214318.92%
280901--0200--04 1108932105245316.53%
334701--0200--05 11953131842503-21.39%
438101--0200--06 1109122552306019.91%
580201--0200--07 1114883126393525.88%
759401--0200--08 1114844148499220.35%
984601--0200--09 1114995299624317.81%

Table 5.Fix Tweet.create_at condition for one hour and increase the User.creat_at range.

   Scanuser time Indextweet time indexintersectionoverhead
resultmonthhourTime (Avg last 5)    
116601--0200--011134231124691397161515.60%
175101--0300--01 1102071375180831.49%
226201--0400--01 1109651462173818.88%
285001--0500--01 1122911371183533.84%
375301--0600--01 1115871289189046.63%
467901--0700--01 1117691320202953.71%
554401--0800--01 1122091340207654.93%
625001--0900--01 1130291370227065.69%
695501--1000--01 1125351310232977.79%

Though the two access methods have very different execution time, the intersection tend to catch with the fastest one. The overhead comparing to the fastest path is from 15% to 78%, while the speed up comparing to the slowest one is about 5~10 times faster.

 

Review Patch: 

https://asterix-gerrit.ics.uci.edu/#/c/577  and  https://asterix-gerrit.ics.uci.edu/#/c/578

...