...
For instance, we might want to create an empty table backed by Druid using a CREATE TABLE statement and then append and overwrite data using INSERT and INSERT OVERWRITE Hive statements, respectively.
Code Block | ||||
---|---|---|---|---|
| ||||
CREATE EXTERNAL TABLE druid_table_1
(`__time` TIMESTAMP, `dimension1` STRING, `dimension2` STRING, `metric1` INT, `metric2` FLOAT)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler';
|
Another possible scenario is that our data is stored in Hive tables and we want to preprocess it and create Druid datasources from Hive to accelerate our SQL query workload. We can do that by executing a Create Table As Select (CTAS) statement. For example:
Code Block | ||||
---|---|---|---|---|
| ||||
CREATE EXTERNAL TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
AS
<select `timecolumn` as `___time`, `dimension1`, `dimension2`, `metric1`, `metric2`....>; |
...
In both statements, the column types (either specified statically for CREATE TABLE statements or inferred from the query result for CTAS statements) are used to infer the corresponding Druid column category.
Further, note that if we do not use EXTERNAL tables, we do not specify the value for the druid.datasource
property. In fact, Hive automatically uses the fully qualified name of the table to create the corresponding datasource with the same name.
Info | ||
---|---|---|
| ||
Version 3.0.0-: CREATE TABLE syntax when data is managed via hive. CREATE TABLE druid_table_1
(`__time` TIMESTAMP, `dimension1` STRING, `dimension2` STRING, `metric1` INT, `metric2` FLOAT)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'; NOTE - Before Hive 3.0.0, we do not use EXTERNAL tables and do not specify the value for the |
Druid kafka ingestion from Hive
...