...
Code Block |
---|
|
-- GRANULARITY: MONTH
SELECT `month`floor_granularity`month`(`__time`), max(delta), sum(added)
FROM druid_table_1
GROUP BY `month`floor_granularity`month`(`__time`); |
Basically, we group by a given time granularity and calculate the aggregation results for each resulting group. In particular, the floor_month_granularity
function function over the timestamp dimension __time
represents the Druid month granularity format. Currently, we support floor_year
, floor_granularityquarter
, quarterfloor_granularitymonth
, monthfloor_granularityweek
, weekfloor_granularityday
, dayfloor_granularityhour
, hourfloor_granularity
, minute_granularity
, and floor_second_granularity
granularities granularities. In addition, we support two special types of granularities, all
and none
, which we describe below. We plan to extend our integration work to support other important Druid custom granularity constructs, such as duration and period granularities.
...
Code Block |
---|
hive> EXPLAIN
> SELECT `month`floor_granularity`month`(`__time`), max(delta), sum(added)
> FROM druid_table_1
> GROUP BY `month`floor_granularity`month`(`__time`);
OK
Plan optimized by CBO.
Stage-0
Fetch Operator
limit:-1
Select Operator [SEL_1]
Output:["_col0","_col1","_col2"]
TableScan [TS_0]
Output:["__time","$f1","$f2"],
properties:{"druid.query.json":"{\"queryType\":\"timeseries\",\"dataSource\":\"wikiticker\",\"descending\":\"false\",\"granularity\":\"MONTH\",\"aggregations\":[{\"type\":\"longMax\",\"name\":\"$f1\",\"fieldName\":\"delta\"},{\"type\":\"longSum\",\"name\":\"$f2\",\"fieldName\":\"added\"}],\"intervals\":[\"-146136543-09-08T08:22:17.096-00:01:15/146140482-04-24T16:36:27.903+01:00\"]}","druid.query.type":"timeseries"}
Time taken: 0.116 seconds, Fetched: 10 row(s) |
...
Code Block |
---|
|
SELECT `channel`, `month`floor_granularity`month`(`__time`), max(delta) as m, sum(added)
FROM druid_table_1
GROUP BY `channel`, `month`floor_granularity`month`(`__time`)
ORDER BY m DESC
LIMIT 10; |
...
Code Block |
---|
hive> SELECT `channel`, `month`floor_granularity`month`(`__time`), max(delta) as m, sum(added)
> FROM druid_table_1
> GROUP BY `channel`, `month`floor_granularity`month`(`__time`)
> ORDER BY m DESC
> LIMIT 10;
OK
#en.wikipedia 2015-09-01 01:00:00 199818 3045299
#ru.wikipedia 2015-09-01 01:00:00 102719 640698
#es.wikipedia 2015-09-01 01:00:00 94187 634670
#fr.wikipedia 2015-09-01 01:00:00 92182 642555
#ar.wikipedia 2015-09-01 01:00:00 73433 153605
#cs.wikipedia 2015-09-01 01:00:00 57174 132768
#de.wikipedia 2015-09-01 01:00:00 52923 522625
#hu.wikipedia 2015-09-01 01:00:00 49061 166101
#nl.wikipedia 2015-09-01 01:00:00 48573 145634
#ja.wikipedia 2015-09-01 01:00:00 47871 317242
Time taken: 1.038 seconds, Fetched: 10 row(s) |
...
Code Block |
---|
|
SELECT a.channel, b.col1
FROM
(
SELECT `channel`, max(delta) as m, sum(added)
FROM druid_table_1
GROUP BY `channel`, `year`floor_granularity`year`(`__time`)
ORDER BY m DESC
LIMIT 1000
) a
JOIN
(
SELECT col1, col2
FROM hive_table_1
) b
ON a.channel = b.col2; |
...
Code Block |
---|
hive> explain
> SELECT a.channel, b.col1
> FROM
> (
> SELECT `channel`, max(delta) as m, sum(added)
> FROM druid_table_1
> GROUP BY `channel`, `year`floor_granularity`year`(`__time`)
> ORDER BY m DESC
> LIMIT 1000
> ) a
> JOIN
> (
> SELECT col1, col2
> FROM hive_table_1
> ) b
> ON a.channel = b.col2;
OK
Plan optimized by CBO.
Vertex dependency in root stage
Map 2 <- Map 1 (BROADCAST_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Map 2
File Output Operator [FS_11]
Select Operator [SEL_10] (rows=1 width=0)
Output:["_col0","_col1"]
Map Join Operator [MAPJOIN_16] (rows=1 width=0)
Conds:RS_7._col0=SEL_6._col1(Inner),HybridGraceHashJoin:true,Output:["_col0","_col2"]
<-Map 1 [BROADCAST_EDGE]
BROADCAST [RS_7]
PartitionCols:_col0
Filter Operator [FIL_2] (rows=1 width=0)
predicate:_col0 is not null
Select Operator [SEL_1] (rows=1 width=0)
Output:["_col0"]
TableScan [TS_0] (rows=1 width=0)
druid@druid_table_1,druid_table_1,Tbl:PARTIAL,Col:NONE,Output:["channel"],properties:{"druid.query.json":"{\"queryType\":\"topN\",\"dataSource\":\"wikiticker\",\"granularity\":\"YEAR\",\"dimension\":\"channel\",\"metric\":\"$f2\",\"aggregations\":[{\"type\":\"longMax\",\"name\":\"$f2\",\"fieldName\":\"delta\"}],\"intervals\":[\"-146136543-09-08T08:22:17.096-00:01:15/146140482-04-24T16:36:27.903+01:00\"],\"threshold\":1000}","druid.query.type":"topN"}
<-Select Operator [SEL_6] (rows=1 width=15)
Output:["_col0","_col1"]
Filter Operator [FIL_15] (rows=1 width=15)
predicate:col2 is not null
TableScan [TS_4] (rows=1 width=15)
druid@hive_table_1,hive_table_1,Tbl:COMPLETE,Col:NONE,Output:["col1","col2"]
Time taken: 0.924 seconds, Fetched: 31 row(s)
hive> SELECT a.channel, b.col1
> FROM
> (
> SELECT `channel`, max(delta) as m, sum(added)
> FROM druid_table_1
> GROUP BY `channel`, `year`floor_granularity`year`(`__time`)
> ORDER BY m DESC
> LIMIT 1000
> ) a
> JOIN
> (
> SELECT col1, col2
> FROM hive_table_1
> ) b
> ON a.channel = b.col2;
Query ID = user1_20160818202329_e9a8b3e8-18d3-49c7-bfe0-99d38d2402d3
Total jobs = 1
Launching Job 1 out of 1
2016-08-18 20:23:30 Running Dag: dag_1471548210492_0001_1
2016-08-18 20:23:30 Starting to run new task attempt: attempt_1471548210492_0001_1_00_000000_0
Status: Running (Executing on YARN cluster with App id application_1471548210492_0001)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Map 2 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 0.15 s
----------------------------------------------------------------------------------------------
2016-08-18 20:23:31 Completed running task attempt: attempt_1471548210492_0001_1_00_000000_0
OK
#en.wikipedia 1
Time taken: 1.835 seconds, Fetched: 2 row(s) |
...