Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • frequency
    • hll
    • cpc
    • theta
  • frequent items
    • freq
  • histograms
    • kll

functionName

namedescription
sketchgenerates sketch data from input
estimatecomputes the estimate for frequency related sketches
union

aggregate function to merge multiple sketches

union_f

unions 2 sketches given in the arguments

nnumber of elements
cdfcumulative distribution
rankestimates the rank of the given element; returns a value in the range of 0~1
intersectaggregate to intersect multiple sketches
intersect_fintersect 2 sketches given in the arguments
stringifyreturns the the sketch in a more readable form

List declared sketch functions

...

desc function ds_freq_sketch;


Integration with materialized views

Sketch aggregation(s) are exposed to Calcite by some extensions - which could enable both the usage of an MV in a smaller dimension query; or could help in incremental updates.

Examples

#0 Prepare sample table

Code Block
languagesql
titlePrepare sample table
-- prepare input data
create temporary table sketch_input (id int, category char(1));
insert into table sketch_input values
  (1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'a'), (7, 'a'), (8, 'a'), (9, 'a'), (10, 'a'),
  (6, 'b'), (7, 'b'), (8, 'b'), (9, 'b'), (10, 'b'), (11, 'b'), (12, 'b'), (13, 'b'), (14, 'b'), (15, 'b');


-- prepare input data
create temporary table sketch_input (id int, category char(1));
insert into table sketch_input values
(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'a'), (7, 'a'), (8, 'a'), (9, 'a'), (10, 'a'),
(6, 'b'), (7, 'b'), (8, 'b'), (9, 'b'), (10, 'b'), (11, 'b'), (12, 'b'), (13, 'b'), (14, 'b'), (15, 'b');

-- build sketches per category
create temporary table sketch_intermediate (category char(1), sketch binary);
insert into sketch_intermediate select category, ds_hll_sketch(id) from sketch_input group by category;

-- get unique count estimates per category
select category, ds_hll_estimate(sketch) from sketch_intermediate;


-- union sketches across categories and get overall unique count estimate
select ds_hll_estimate(ds_hll_union(sketch)) from sketch_intermediate;


#1 HLL