Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Physically, each ETS node will introduce a thread. Thus, the intersection operator must synchronize the upstream input threads in order to generate the correct result. In order to have a pipeline operation, the intersection is implemented in a sort-merge manner. Therefore, each input is required to be sorted. The synchronization is handled by the thread of input No.0, which means the thread 0 will call the writer.open/nextFrame/close functions. If we authorize arbitrary threads to push forward, the downstream operator will be confused, especially in synchronizing their locks. The core logical intersection function is as below:

  1. do 
    1. find the max input: maxinput id of the maximum record
    2. for each input i
      1. if record < max keep popping 
      2. if record == max keep popping until it matches max. then match++; continue
      3. if > max, break
    3. If match == inputArity
      1. output max record
  2. while no input is closed.

...

Code Block
use dataverse twitter; 
let $ts_start := datetime("2015-11-23T17:$min_start:00.000Z") 
let $ts_end := datetime("2015-11-23T17:$min_end:03.000Z") 
let $ms_start := date("2010-$month_start-01") 
let $ms_end := date("2010-$month_end-28") 
let $result := for $t in dataset ds_tweets 
               where $t.user.create_at >= $ms_start and $t.user.create_at < $ms_end 
               and $t.create_at >= $ts_start and $t.create_at < $ts_end 
               and $t.place = "Unite State" return $t return count($result) 
               return $t
return count($result)

...

Each query will run ten times. We record the time by average the last fives.  The time unit is Milliseconds.

Table 1. Fix the User.create_at $month_start = 01, $month_end = 02, increasing the Tweets.create_at selectivity

...

Code Block
use dataverse twitter; 
let $ms_start := date("2010-$month_start-01") 
let $ms_end := date("2010-$month_end-28") 
let $region := create-circle(create-point(-118.125,33.939), $radius)
let $result := for $t in dataset ds_tweets 
               where $t.user.create_at >= $ms_start and $t.user.create_at < $ms_end 
               and spatial-intersect($t.place.bounding_box, $region)
               and $t.place = "Unite State" return $t return count($result)
               return $t
return count($result)

...

   Scanuser time IndexRtree Indexintersectionspeedup
resultmonthhourradiusTime (Avg last 5)    
139001--020.01  111087106159929311.4235446
155101--020.02  11130610712710012 10.69986017
157501--020.03  11202410814310278 10.52179412
617101--020.04  11126431850 3.493375196
619301--020.05  11291632001 3.528514734
668901--020.06  11167333952 3.289143497
690001--020.07  11101234946 3.176672581
690001--020.08  11157034937 3.193462518

The experiment is slow. Stay tuned. 

...