...
Physically, each ETS node will introduce a thread. Thus, the intersection operator must synchronize the upstream input threads in order to generate the correct result. In order to have a pipeline operation, the intersection is implemented in a sort-merge manner. Therefore, each input is required to be sorted. The synchronization is handled by the thread of input No.0, which means the thread 0 will call the writer.open/nextFrame/close functions. If we authorize arbitrary threads to push forward, the downstream operator will be confused, especially in synchronizing their locks. The core logical intersection function is as below:
- do
- find the max input: maxinput id of the maximum record
- for each input i
- if record < max keep popping
- if record == max keep popping until it matches max. then match++; continue
- if > max, break
- If match == inputArity
- output max record
- while no input is closed.
...
Each query will run ten times. We record the time by average the last fives. The time unit is Milliseconds.
Table 1. Fix the User.create_at $month_start = 01, $month_end = 02, increasing the Tweets.create_at selectivity
...
Scan | user time Index | Rtree Index | intersection | speedup | |||
result | month | radius | Time (Avg last 5) | ||||
1390 | 01--02 | 0.01 | 111087 | 106159 | 9293 | 11.4235446 | |
1551 | 01--02 | 0.02 | 111306 | 107127 | 10012 | 10.69986017 | |
1575 | 01--02 | 0.03 | 112024 | 108143 | 10278 | 10.52179412 | |
6171 | 01--02 | 0.04 | 111264 | 31850 | 3.493375196 | ||
6193 | 01--02 | 0.05 | 112916 | 32001 | 3.528514734 | ||
6689 | 01--02 | 0.06 | 111673 | 33952 | 3.289143497 | ||
6900 | 01--02 | 0.07 | 111012 | 34946 | 3.176672581 | ||
6900 | 01--02 | 0.08 | 111570 | 34937 | 3.193462518 |
The experiment is slow. Stay tuned.
...