Table of Contents
Syntax
JdbcStorageHandler supports reading from jdbc data source in Hive. Currently writing to a jdbc data source is not supported. To use JdbcStorageHandler, you need to create an external table using JdbcStorageHandler. Here is a simple example:
...
Code Block | ||
---|---|---|
| ||
ALTER TABLE student_jdbc SET TBLPROPERTIES ("hive.sql.dbcp.password" = "passwd"); |
Table Properties
In the create table statement, you are required to specify the following table properties:
...
hive.sql.catalog: jdbc catalog name (only valid if “hive.sql.table“ is specified)
hive.sql.schema: jdbc schema name (only valid if “hive.sql.table“ is specified)
hive.sql.jdbc.fetch.size: number of rows to fetch in a batch
hive.sql.dbcp.xxx: all dbcp parameters will pass to commons-dbcp. See https://commons.apache.org/proper/commons-dbcp/configuration.html for definition of the parameters. For example, if you specify hive.sql.dbcp.maxActive=1 in table property, Hive will pass maxActive=1 to commons-dbcp
Supported Data Type
The column data type for a Hive JdbcStorageHandler table can be:
...
Note complex data type: struct, map, array are not supported
Column/Type Mapping
hive.sql.table / hive.sql.query defines a tabular data with a schema. The schema definition has to be the same as the table schema definition. For example, the following create table statement will fail:
...
Hive will try to convert the double “gpa” of underlining table STUDENT to decimal(4,3) as the effective_gpa field of the student_jdbc table. In case the conversion is not possible, Hive will produce null for the field.
Auto Shipping
JdbcStorageHandler will ship required jars to MR/Tez/LLAP backend automatically if JdbcStorageHandler is used in the query. User don’t need to add jar manually. JdbcStorageHandler will also ship required jdbc driver jar to the backend if it detects any jdbc driver jar in classpath (include mysql, postgres, oracle and mssql). However, user are still required to copy jdbc driver jar to hive classpath (usually, lib directory in hive).
Securing Password
In most cases, we don’t want to store jdbc password in clear text in table property "hive.sql.dbcp.password". Instead, user can store password in a Java keystore file on HDFS using the following command:
...
You will need to protect the keystore file by only authorize targeted user to read this file using authorizer (such as ranger). Hive will check the permission of the keystore file to make sure user has read permission of it when creating/altering table.
Partitioning
Hive is able to split the jdbc data source and process each split in parallel. User can use the following table property to decide whether or not to split and how many splits to split into:
...
Code Block | ||
---|---|---|
| ||
jdbc.JdbcInputFormat: Num input splits created 4 jdbc.JdbcInputFormat: split:interval:ikey[,70) jdbc.JdbcInputFormat: split:interval:ikey[70,80) jdbc.JdbcInputFormat: split:interval:ikey[80,90) jdbc.JdbcInputFormat: split:interval:ikey[90,) |
Computation Pushdown
Hive will pushdown computation to jdbc table aggressively, so we can make best usage of the native capacity of jdbc data source.
...
The derived mysql query can be very complex and in many cases we don’t want to split the data source thus run the complex query multiple times on each split. So if the computation is more then just filter and transform, Hive will not split the query result even if “hive.sql.numPartitions” is more than 1.
Using a Non-default Schema
The notion of schema differs from DBMS to DBMS, such as Oracle, MSSQL, MySQL, and PostgreSQL. Correct usage of the hive.sql.schema table property can prevent problems with client connections to external JDBC tables. For more information, see Hive-25591. To create external tables based on a user-defined schema in a JDBC-compliant database, follow the examples below for respective databases.
MariaDB
Code Block | ||
---|---|---|
| ||
CREATE SCHEMA bob;
CREATE TABLE bob.country
(
id int,
name varchar(20)
);
insert into bob.country
values (1, 'India');
insert into bob.country
values (2, 'Russia');
insert into bob.country
values (3, 'USA');
CREATE SCHEMA alice;
CREATE TABLE alice.country
(
id int,
name varchar(20)
);
insert into alice.country
values (4, 'Italy');
insert into alice.country
values (5, 'Greece');
insert into alice.country
values (6, 'China');
insert into alice.country
values (7, 'Japan'); |
MS SQL
Code Block | ||
---|---|---|
| ||
CREATE DATABASE world;
USE world;
CREATE SCHEMA bob;
CREATE TABLE bob.country
(
id int,
name varchar(20)
);
insert into bob.country
values (1, 'India');
insert into bob.country
values (2, 'Russia');
insert into bob.country
values (3, 'USA');
CREATE SCHEMA alice;
CREATE TABLE alice.country
(
id int,
name varchar(20)
);
insert into alice.country
values (4, 'Italy');
insert into alice.country
values (5, 'Greece');
insert into alice.country
values (6, 'China');
insert into alice.country
values (7, 'Japan'); |
Create a user and associate them with a default schema. For example:
Code Block | ||
---|---|---|
| ||
CREATE LOGIN greg WITH PASSWORD = 'GregPass123!$';
CREATE USER greg FOR LOGIN greg WITH DEFAULT_SCHEMA=bob; |
Allow the user to connect to the database and run queries. For example:
Code Block | ||
---|---|---|
| ||
GRANT CONNECT, SELECT TO greg; |
Oracle
In Oracle, dividing the tables into different namespaces/schemas is achieved through different users. The CREATE SCHEMA statement exists in Oracle, but has different semantics from those defined by SQL Standard and those adopted in other DBMS.
To create "local" users in Oracle you need to be connected to the Pluggable Database (PDB), not to the Container Database (CDB). The following example was tested in Oracle XE edition, using only PDB XEPDB1.
Code Block | ||
---|---|---|
| ||
ALTER SESSION SET CONTAINER = XEPDB1; |
Create the bob schema/user and give appropriate connections to be able to connect to the database. For example:
Code Block | ||
---|---|---|
| ||
CREATE USER bob IDENTIFIED BY bobpass;
ALTER USER bob QUOTA UNLIMITED ON users;
GRANT CREATE SESSION TO bob;
CREATE TABLE bob.country
(
id int,
name varchar(20)
);
insert into bob.country
values (1, 'India');
insert into bob.country
values (2, 'Russia');
insert into bob.country
values (3, 'USA'); |
Create the alice schema/user and give appropriate connections to be able to connect to the database. For example:
Code Block | ||
---|---|---|
| ||
CREATE USER alice IDENTIFIED BY alicepass;
ALTER USER alice QUOTA UNLIMITED ON users;
GRANT CREATE SESSION TO alice;
CREATE TABLE alice.country
(
id int,
name varchar(20)
);
insert into alice.country
values (4, 'Italy');
insert into alice.country
values (5, 'Greece');
insert into alice.country
values (6, 'China');
insert into alice.country
values (7, 'Japan'); |
Without the SELECT ANY privilege, a user cannot see the tables/views of another user. When a user connects to the database using a specific user and schema it is not possible to refer to tables in another user/schema
-- namespace. You need to grant the SELECT ANY privilege. For example:
Code Block | ||
---|---|---|
| ||
GRANT SELECT ANY TABLE TO bob;
GRANT SELECT ANY TABLE TO alice; |
Allow the users to perform inserts on any table/view in the database, not only those present on their own schema. For example:
Code Block | ||
---|---|---|
| ||
GRANT INSERT ANY TABLE TO bob;
GRANT INSERT ANY TABLE TO alice; |
PostgreSQL
Code Block | ||
---|---|---|
| ||
CREATE SCHEMA bob;
CREATE TABLE bob.country
(
id int,
name varchar(20)
);
insert into bob.country
values (1, 'India');
insert into bob.country
values (2, 'Russia');
insert into bob.country
values (3, 'USA');
CREATE SCHEMA alice;
CREATE TABLE alice.country
(
id int,
name varchar(20)
);
insert into alice.country
values (4, 'Italy');
insert into alice.country
values (5, 'Greece');
insert into alice.country
values (6, 'China');
insert into alice.country
values (7, 'Japan'); |
Create a user and associate them with a default schema <=> search_path. For example:
Code Block | ||
---|---|---|
| ||
CREATE ROLE greg WITH LOGIN PASSWORD 'GregPass123!$';
ALTER ROLE greg SET search_path TO bob; |
Grant the necessary permissions to access the schema. For example:
Code Block | ||
---|---|---|
| ||
GRANT USAGE ON SCHEMA bob TO greg;
GRANT SELECT ON ALL TABLES IN SCHEMA bob TO greg; |