Info

title	Version information

Druid integration is introduced in Hive 2.2.0 (HIVE-14217). It is Initially it was compatible with Druid 0.9.1.1, the latest stable release of Druid to that date.

Introduction

This page documents the work done for the integration between Druid and Hive, which was started in HIVE-14217.

...

Future work after the first step is completed is being listed in HIVE-14473. It includes, among others, the possibility to use Create Table As Select (CTAS) statements to create datasources in Druid from Hive (HIVE-14474). If you want to collaborate on this effort, a list of remaining issues can be found at the end of this document.

...

For the running examples, we use the wikiticker dataset included in the quickstart tutorial of Druid.

Discovery and management of Druid datasources from Hive

First we focus on the discovery and management of Druid datasources from Hive.

Create tables linked to existing Druid datasources

Assume that we have already indexed stored the wikiticker dataset mentioned previously in Druid, and the address of the Druid broker is 10.5.0.10:8082.

...

We can see there are three different groups of columns corresponding to the Druid categories: the timestamp column (__time) mandatory in Druid, the dimension columns (whose type is STRING), and the metrics columns (all the rest).

Create tables linked to new Druid datasources

If we want to manage the data in the Druid datasources from Hive, there are multiple possible scenarios.

For instance, we might want to create an empty table backed by Druid using a CREATE TABLE statement and then append and overwrite data using INSERT and INSERT OVERWRITE Hive statements, respectively.

Code Block

	sql
	sql

CREATE TABLE druid_table_1
(`__time` TIMESTAMP, `dimension1` STRING, `dimension2` STRING, `metric1` INT, `metric2` FLOAT)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler';

Observe that we still create three different groups of columns corresponding to the Druid categories: the timestamp column (__time) mandatory in Druid, the dimension columns (whose type is STRING), and the metrics columns (all the rest).

Another possible scenario is that our data is stored in Hive tables and we want to preprocess it and create Druid datasources from Hive to accelerate our SQL query workload. We can do that by executing a Create Table As Select (CTAS) statement. In the following we provide multiple examples for each of these statements.

Code Block

	sql
	sql

CREATE TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
AS
<select `timecolumn` as `___time`, `dimension1`, `dimension2`, `metric1`, `metric2`....>;

Insert data into Druid datasources from Hive

Assume that we have already indexed the wikiticker dataset mentioned previously, and the address of the Druid broker is 10.5.0.10:8082. It includes, among others, the possibility to use Create Table As Select (CTAS) statements to create datasources in Druid from Hive (HIVE-14474).

Drop Hive tables associated to Druid datasources

Assume that we have already indexed the wikiticker dataset mentioned previously, and the address of the Druid broker is 10.5.0.10:8082. It includes, among others, the possibility to use Create Table As Select (CTAS) statements to create datasources in Druid from Hive (HIVE-14474).

Querying Druid from Hive

Once we have created our first table stored in Druid using the DruidStorageHandler, we are ready to execute our queries against Druid.

...

Space shortcuts

Child pages

Versions Compared

Old Version 9

New Version 10

Key

Introduction

Discovery and management of Druid datasources from Hive

Create tables linked to existing Druid datasources

Create tables linked to new Druid datasources

Insert data into Druid datasources from Hive

Drop Hive tables associated to Druid datasources

Querying Druid from Hive

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 9

New Version 10

Key

Introduction

Discovery and management of Druid datasources from Hive

Create tables linked to existing Druid datasources

Create tables linked to new Druid datasources

Insert data into Druid datasources from Hive

Drop Hive tables associated to Druid datasources

Querying Druid from Hive