Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Physically, each ETS node will introduce a thread. Thus, the intersection operator must synchronize the upstream input threads in order to generate the correct result. In order to have a pipeline operation, the intersection is implemented in a sort-merge manner. Therefore, each input is required to be sorted. The synchronization is handled by the thread of input No.0, which means the thread 0 will call the writer.open/nextFrame/close functions. If we authorize arbitrary threads to push forward, the downstream operator will be confused, especially in synchronizing their locks. The core logical intersection function is as below:

  1. do 
    1. find the max input: maxinput id of the maximum record
    2. for each input i
      1. if record < max keep popping 
      2. if record == max keep popping until it matches max. then match++; continue
      3. if > max, break
    3. If match == inputArity
      1. output max record
  2. while no input is closed.

...

Each query will run ten times. We record the time by average the last fives.  The time unit is Milliseconds.

Table 1. Fix the User.create_at $month_start = 01, $month_end = 02, increasing the Tweets.create_at selectivity

...

   Scanuser time IndexRtree Indexintersectionspeedup
resultmonthhourradiusTime (Avg last 5)    
139001--020.01  111087106159929311.4235446
155101--020.02  11130610712710012 10.69986017
157501--020.03  11202410814310278 10.52179412
617101--020.04  11126431850 3.493375196
619301--020.05  11291632001 3.528514734
668901--020.06  11167333952 3.289143497
690001--020.07  11101234946 3.176672581
690001--020.08  11157034937 3.193462518

The experiment is slow. Stay tuned. 

...