You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

1. Motivation

To efficiently support the sort-dependent operators based on the Hyracks running framework, a multiple stages-based subdivision is used to dynamically construct the rangemap, cache the frames as well as parallelize the operations. Since the down-streaming operator could be connected with the sort-dependent operators in one stage, which means both those down-streaming operators and the sort-dependent operators can execute in pipeline. We should cache the frames from the down-streaming operators to the sort-dependent operators until the target rangemap is completely figured out. Based on a range partitioner, the caching frames can be forwarded onto their targets correspondingly.

2. RangeMap Generation Based on Aggregation

Combining a replicate operator, the aggregation-based RangeMap is generated based on a streaming algorithm to dynamically construct the histogram.

 

 

 

use dataverse tpch;

let $rg := rg(

for $d in dataset Lineitem

return $d.l_extendedprice

)

return $rg

 

 

 

 

3. Parallel Sort

In general, the parallel sort is divided into five stages, i.e., replicate, local aggregation, global aggregation, forward, sort and merge, to scale up the sort based on Hyracks.

A parallel sort template can be given as:

use dataverse tpch;

for $d in dataset Lineitem
/*+ psort */

order by $d.l_extendedprice

return $d


3.1 Five stages of parallel sort

 

 

4. Binary In-Equal Join

 

use dataverse tpch;

for $d in dataset Lineitem

for $t in dataset Orders
/*+ psort */

where $d.l_extendedprice - $t.o_totalprice > -2 and $d.l_extendedprice - $t.o_totalprice < 2

return

Unknown macro: {"ok"}

 


4.1 Five stages of parallel sort-based binary join

 

  • No labels