Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
sql
sql
-- GRANULARITY: MONTH
SELECT `month`floor_granularity`month`(`__time`), max(delta), sum(added)
FROM druid_table_1
GROUP BY `month`floor_granularity`month`(`__time`);

Basically, we group by a given time granularity and calculate the aggregation results for each resulting group. In particular, the floor_month_granularity function  function over the timestamp dimension __time represents the Druid month granularity format. Currently, we support floor_year, floor_granularityquarter, quarterfloor_granularitymonth, monthfloor_granularityweek, weekfloor_granularityday, dayfloor_granularityhour, hourfloor_granularity, minute_granularity, and floor_second_granularity granularities granularities. In addition, we support two special types of granularities, all and none, which we describe below. We plan to extend our integration work to support other important Druid custom granularity constructs, such as duration and period granularities.

...

Code Block
hive> EXPLAIN
    > SELECT `month`floor_granularity`month`(`__time`), max(delta), sum(added)
    > FROM druid_table_1
    > GROUP BY `month`floor_granularity`month`(`__time`);
OK
Plan optimized by CBO.
Stage-0
  Fetch Operator
    limit:-1
    Select Operator [SEL_1]
      Output:["_col0","_col1","_col2"]
      TableScan [TS_0]
        Output:["__time","$f1","$f2"],
        properties:{"druid.query.json":"{\"queryType\":\"timeseries\",\"dataSource\":\"wikiticker\",\"descending\":\"false\",\"granularity\":\"MONTH\",\"aggregations\":[{\"type\":\"longMax\",\"name\":\"$f1\",\"fieldName\":\"delta\"},{\"type\":\"longSum\",\"name\":\"$f2\",\"fieldName\":\"added\"}],\"intervals\":[\"-146136543-09-08T08:22:17.096-00:01:15/146140482-04-24T16:36:27.903+01:00\"]}","druid.query.type":"timeseries"}
Time taken: 0.116 seconds, Fetched: 10 row(s)

...

Code Block
sql
sql
SELECT `channel`, `month`floor_granularity`month`(`__time`), max(delta) as m, sum(added)
FROM druid_table_1
GROUP BY `channel`, `month`floor_granularity`month`(`__time`)
ORDER BY m DESC
LIMIT 10;

...

Code Block
hive> SELECT `channel`, `month`floor_granularity`month`(`__time`), max(delta) as m, sum(added)
    > FROM druid_table_1
    > GROUP BY `channel`, `month`floor_granularity`month`(`__time`)
    > ORDER BY m DESC
    > LIMIT 10;
OK
#en.wikipedia	2015-09-01 01:00:00	199818	3045299
#ru.wikipedia	2015-09-01 01:00:00	102719	640698
#es.wikipedia	2015-09-01 01:00:00	94187	634670
#fr.wikipedia	2015-09-01 01:00:00	92182	642555
#ar.wikipedia	2015-09-01 01:00:00	73433	153605
#cs.wikipedia	2015-09-01 01:00:00	57174	132768
#de.wikipedia	2015-09-01 01:00:00	52923	522625
#hu.wikipedia	2015-09-01 01:00:00	49061	166101
#nl.wikipedia	2015-09-01 01:00:00	48573	145634
#ja.wikipedia	2015-09-01 01:00:00	47871	317242
Time taken: 1.038 seconds, Fetched: 10 row(s)

...

Code Block
sql
sql
SELECT a.channel, b.col1
FROM
(
  SELECT `channel`, max(delta) as m, sum(added)
  FROM druid_table_1
  GROUP BY `channel`, `year`floor_granularity`year`(`__time`)
  ORDER BY m DESC
  LIMIT 1000
) a
JOIN
(
  SELECT col1, col2
  FROM hive_table_1
) b
ON a.channel = b.col2;

...

Code Block
hive> explain
    > SELECT a.channel, b.col1
    > FROM
    > (
    >   SELECT `channel`, max(delta) as m, sum(added)
    >   FROM druid_table_1
    >   GROUP BY `channel`, `year`floor_granularity`year`(`__time`)
    >   ORDER BY m DESC
    >   LIMIT 1000
    > ) a
    > JOIN
    > (
    >   SELECT col1, col2
    >   FROM hive_table_1
    > ) b
    > ON a.channel = b.col2;
OK
Plan optimized by CBO.
Vertex dependency in root stage
Map 2 <- Map 1 (BROADCAST_EDGE)
Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Map 2
      File Output Operator [FS_11]
        Select Operator [SEL_10] (rows=1 width=0)
          Output:["_col0","_col1"]
          Map Join Operator [MAPJOIN_16] (rows=1 width=0)
            Conds:RS_7._col0=SEL_6._col1(Inner),HybridGraceHashJoin:true,Output:["_col0","_col2"]
          <-Map 1 [BROADCAST_EDGE]
            BROADCAST [RS_7]
              PartitionCols:_col0
              Filter Operator [FIL_2] (rows=1 width=0)
                predicate:_col0 is not null
                Select Operator [SEL_1] (rows=1 width=0)
                  Output:["_col0"]
                  TableScan [TS_0] (rows=1 width=0)
                    druid@druid_table_1,druid_table_1,Tbl:PARTIAL,Col:NONE,Output:["channel"],properties:{"druid.query.json":"{\"queryType\":\"topN\",\"dataSource\":\"wikiticker\",\"granularity\":\"YEAR\",\"dimension\":\"channel\",\"metric\":\"$f2\",\"aggregations\":[{\"type\":\"longMax\",\"name\":\"$f2\",\"fieldName\":\"delta\"}],\"intervals\":[\"-146136543-09-08T08:22:17.096-00:01:15/146140482-04-24T16:36:27.903+01:00\"],\"threshold\":1000}","druid.query.type":"topN"}
          <-Select Operator [SEL_6] (rows=1 width=15)
              Output:["_col0","_col1"]
              Filter Operator [FIL_15] (rows=1 width=15)
                predicate:col2 is not null
                TableScan [TS_4] (rows=1 width=15)
                  druid@hive_table_1,hive_table_1,Tbl:COMPLETE,Col:NONE,Output:["col1","col2"]
Time taken: 0.924 seconds, Fetched: 31 row(s)
hive> SELECT a.channel, b.col1
    > FROM
    > (
    >   SELECT `channel`, max(delta) as m, sum(added)
    >   FROM druid_table_1
    >   GROUP BY `channel`, `year`floor_granularity`year`(`__time`)
    >   ORDER BY m DESC
    >   LIMIT 1000
    > ) a
    > JOIN
    > (
    >   SELECT col1, col2
    >   FROM hive_table_1
    > ) b
    > ON a.channel = b.col2;
Query ID = user1_20160818202329_e9a8b3e8-18d3-49c7-bfe0-99d38d2402d3
Total jobs = 1
Launching Job 1 out of 1
2016-08-18 20:23:30 Running Dag: dag_1471548210492_0001_1
2016-08-18 20:23:30 Starting to run new task attempt: attempt_1471548210492_0001_1_00_000000_0
Status: Running (Executing on YARN cluster with App id application_1471548210492_0001)
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0
Map 2 .......... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 0.15 s
----------------------------------------------------------------------------------------------
2016-08-18 20:23:31 Completed running task attempt: attempt_1471548210492_0001_1_00_000000_0
OK
#en.wikipedia	1
Time taken: 1.835 seconds, Fetched: 2 row(s)

...