...
For instance, we might want to create an empty table backed by Druid using a CREATE TABLE statement and then append and overwrite data using INSERT and INSERT OVERWRITE Hive statements, respectively.
Code Block | ||||
---|---|---|---|---|
| ||||
CREATE EXTERNAL TABLE druid_table_1
(`__time` TIMESTAMP, `dimension1` STRING, `dimension2` STRING, `metric1` INT, `metric2` FLOAT)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler';
|
Another possible scenario is that our data is stored in Hive tables and we want to preprocess it and create Druid datasources from Hive to accelerate our SQL query workload. We can do that by executing a Create Table As Select (CTAS) statement. For example:
Code Block | ||||
---|---|---|---|---|
| ||||
CREATE EXTERNAL TABLE druid_table_1 STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' AS <select `timecolumn` as `___time`, `dimension1`, `dimension2`, `metric1`, `metric2`....>; |
...
In both statements, the column types (either specified statically for CREATE TABLE statements or inferred from the query result for CTAS statements) are used to infer the corresponding Druid column category.
Further, note that if we do not use EXTERNAL tables, we do not specify the value for the druid.datasource
property. In fact, Hive automatically uses the fully qualified name of the table to create the corresponding datasource with the same name.
Info | ||
---|---|---|
| ||
Version 2.2.0: CREATE TABLE syntax when data is managed via hive. CREATE TABLE druid_table_1
(`__time` TIMESTAMP, `dimension1` STRING, `dimension2` STRING, `metric1` INT, `metric2` FLOAT)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'; NOTE - Before Hive 3.0.0, we do not use EXTERNAL tables and do not specify the value for the For versions 3.0.0+, All Druid tables are EXTERNAL (HIVE-20085). |
Druid kafka ingestion from Hive
...
INSERT, INSERT OVERWRITE and DROP statements
Info | ||
---|---|---|
| ||
Version 2.2.0 : These statements are supported by Hive managed tables (not external) backed by Druid. For versions 3.0.0+, All Druid |
...
tables are EXTERNAL (HIVE-20085) and these statements are supported for any table. |
Querying Druid from Hive
Once we have created our first table stored in Druid using the DruidStorageHandler
, we are ready to execute our queries against Druid.
...