Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Assume that we have already indexed the wikiticker dataset mentioned previously, and the address of the Druid broker is 10.5.0.10:8082. Thus

First, you need to set the Hive property hive.druid.broker.address.default in your configuration file to point to the broker address:

Code Block
sql
sql
SET hive.druid.broker.address.default=10.5.0.10:8082;

Then, to create a table that we can query from Hive, we  we execute the following statement in Hive:

Code Block
sql
sql
CREATE EXTERNAL TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.address" = "10.5.0.10:8082", "druid.datasource" = "wikiticker");

Observe that you need to specify the broker address and  datasource as TBLPROPERTIES using the druid.address and the druid.datasource properties, respectively property. Further, observe that the table needs to be created as EXTERNAL, as data is stored in Druid. The table is just a logical entity that we will use to express our queries, but there is no data movement when we create the table. In fact, what happened under the hood when you execute that statement, is that Hive sends a segment metadata query to Druid in order to discover the schema (columns and their types) of the data source. Retrieval of other information that might be useful such as statistics e.g. number of rows, is in our roadmap, but it is not supported yet.  Finally, note that if we change the Hive property value for the default broker address, queries on this table will automatically run against the new broker address, as the address is not stored with the table.

If we execute a DESCRIBE statement, we can actually see the information about the table:

Code Block
hive> DESCRIBE FORMATTED druid_table_1;
OK
# col_name            	data_type           	comment
__time              	timestamp           	from deserializer
added               	bigint              	from deserializer
channel             	string              	from deserializer
cityname            	string              	from deserializer
comment             	string              	from deserializer
count               	bigint              	from deserializer
countryisocode      	string              	from deserializer
countryname         	string              	from deserializer
deleted             	bigint              	from deserializer
delta               	bigint              	from deserializer
isanonymous         	string              	from deserializer
isminor             	string              	from deserializer
isnew               	string              	from deserializer
isrobot             	string              	from deserializer
isunpatrolled       	string              	from deserializer
metrocode           	string              	from deserializer
namespace           	string              	from deserializer
page                	string              	from deserializer
regionisocode       	string              	from deserializer
regionname          	string              	from deserializer
user                	string              	from deserializer
user_unique         	string              	from deserializer
# Detailed Table Information
Database:           	druid
Owner:              	user1
CreateTime:         	Thu Aug 18 19:09:10 BST 2016
LastAccessTime:     	UNKNOWN
Retention:          	0
Location:           	hdfs:/tmp/user1/hive/warehouse/druid.db/druid_table_1
Table Type:         	EXTERNAL_TABLE
Table Parameters:
	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\"}
	EXTERNAL            	TRUE
	druid.address       	10.5.0.10:8082
	druid.datasource    	wikiticker
	numFiles            	0
	numRows             	0
	rawDataSize         	0
	storage_handler     	org.apache.hadoop.hive.druid.DruidStorageHandler
	totalSize           	0
	transient_lastDdlTime	1471543750
# Storage Information
SerDe Library:      	org.apache.hadoop.hive.druid.serde.DruidSerDe
InputFormat:        	null
OutputFormat:       	null
Compressed:         	No
Num Buckets:        	-1
Bucket Columns:     	[]
Sort Columns:       	[]
Storage Desc Params:
	serialization.format	1
Time taken: 0.111 seconds, Fetched: 55 row(s)

We can see there are three different groups of columns corresponding to the Druid categories: the timestamp column (__time) mandatory in Druid, the dimension columns (whose type is STRING), and the metrics columns (all the rest).

 

If you do not want to include the broker address for the creation of every Druid table, you can also set the Hive property hive.druid.address.default to point to the broker address:

...

SET hive.druid.address.default=10.5.0.10:8082;

Then, to create a table similar to the previous one, we would execute the following statement:

...

CREATE EXTERNAL TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "wikiticker");

In this case, the address is not stored with the table. Thus, if we change the Hive property value for the default broker address, queries on this table will automatically run against the new broker address.

Querying Druid from Hive

...