Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Combining a replicate operator, the aggregation-based RangeMap is generated based on a streaming algorithm to dynamically construct the histogram.

 

 

 

use dataverse tpch;

let $rg := rg(

for $d in dataset Lineitem

return $d.l_extendedprice

)

return $rg

 

 

 

 

3. Parallel Sort

In general, the parallel sort is divided into five stages, i.e., replicate, local aggregation, global aggregation, forward, sort and merge, to scale up the sort based on Hyracks.

A parallel sort template can be given as:

use dataverse tpch;

for $d in dataset Lineitem
/*+ psort */

order by $d.l_extendedprice

return $d


3.1 Four stages of parallel sort

 

Image Added

 

 

Image Added

 

Image Added

 

 

4. Binary In-Equal Join

 

use dataverse tpch;

for $d in dataset Lineitem

for $t in dataset Orders
/*+ psort */

where 4 * $d.l_extendedprice - $t.o_totalprice > -2 and 4 * $d.l_extendedprice - $t.o_totalprice < 2

return

{"ok": $d.l_orderkey, "ln": $d

 

.l_linenumber, "ep": $l.l_extendedprice, "tp": $t.o_totalprice}


43.1 Five stages of parallel sort

Image Removed

...

-based binary join

 

Image Added

Image Added

 

Image Added

 

4.2 Histogram merging based on the densities of the both sides.

Image Added