Hive Data Definition Language
Table of Contents |
---|
Overview
HiveQL DDL statements are documented here, including:
- CREATE DATABASE, SCHEMA, TABLE, VIEW, FUNCTION, INDEX
- DROP DATABASE, SCHEMA, TABLE, VIEW, INDEX
- TRUNCATE TABLE
- ALTER DATABASE, SCHEMA, TABLE, VIEW
- MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
Wiki Markup SHOW DATABASES, SCHEMAS, TABLES, TBLPROPERTIES, PARTITIONS, FUNCTIONS, INDEX\[ES\], COLUMNS, CREATE TABLE
- DESCRIBE DATABASE, table_name, view_name
PARTITION statements are options of TABLE statements, except for SHOW PARTITIONS.
For information about DDL in HCatalog and WebHCat, see HCatalog DDL and WebHCat DDL Resources.
Create/Drop/Alter Database
...
Code Block |
---|
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)]; |
The use uses of SCHEMA and DATABASE are interchangeable – they mean the same thing.
...
Use STORED BY to create a non-native table, for example in HBase. See StorageHandlers for more information on this option.
Partitioned
...
Tables
Partitioned tables can be created using the PARTITIONED BY clause. A table can have one or more partition columns and a separate data directory is created for each distinct value combination in the partition columns. Further, tables or partitions can be bucketed using CLUSTERED BY columns, and data can be sorted within that bucket via SORT BY columns. This can improve performance on certain kinds of queries.
...
In contrast, the statement above creates a new empty_key_value_store table whose definition exactly matches the existing key_value_store in all particulars other than table name. The new table contains no rows.
Inserting Data
...
into Bucketed Tables
The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.
There is also an example of creating and populating bucketed tables.
Skewed
...
Tables
Info | ||
---|---|---|
| ||
Info | ||
---|---|---|
| ||
Read the design document for more information |
...
Removes all rows from a table or partition(s). Currently target table should be native/managed table or exception will be thrown.
User can specify partial partition_spec for truncating multiple partitions at once and omitting partition_spec will truncate all partitions in the table.
...
An error is thrown if the partition_spec for the table already exists. You can use IF NOT EXISTS to skip the error.
Recover
...
Partitions (MSCK REPAIR TABLE)
Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS (say by using hadoop fs -put
command), the metastore (and hence Hive) will not be aware of these partitions unless the user runs ALTER TABLE table_name ADD PARTITION
commands on each of the newly added partitions.
...
Alter Table Rename Partition
Info | ||
---|---|---|
| ||
As of Hive 0.9 |
Code Block |
---|
ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec; |
...
This statement lets you change the value of a partition column.
Create/Drop View
...
Info | ||
---|---|---|
| ||
View support is only available starting in Hive 0.6. |
Create View
Code Block |
---|
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], ...) ] [COMMENT view_comment] [TBLPROPERTIES (property_name = property_value, ...)] AS SELECT ... |
...
In Hive 0.70 or later, DROP returns an error if the function doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.
Create/Drop Index
...
Info | ||
---|---|---|
| ||
As of Hive 0.7 |
...
Create Index
Code Block |
---|
CREATE INDEX index_name ON TABLE base_table_name (col_name, ...) AS index_type [WITH DEFERRED REBUILD] [IDXPROPERTIES (property_name=property_value, ...)] [IN TABLE index_table_name] [ [ ROW FORMAT ...] STORED AS ... | STORED BY ... ] [LOCATION hdfs_path] [TBLPROPERTIES (...)] [COMMENT "index comment"] |
CREATE INDEX creates an index on a table using the given list of columns as keys. See http://wiki.apache.org/hadoop/Hive/IndexDev#CREATE_INDEXCREATE INDEX in the Indexes design document.
Drop Index
Code Block |
---|
DROP INDEX [IF EXISTS] index_name ON table_name |
DROP INDEX drops the index, as well as deleting the index table.
In Hive 0.70 7.0 or later, DROP returns an error if the index doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.
...
SHOW DATABASES lists all of the databases defined in the metastore. The optional LIKE clause allows the list of databases to be filtered using a regular expression. The Wildcards in the regular expression may can only contain be '*' for any character(s) or '|' for a choice. Examples are 'employees', 'emp*', "'emp*|*ees', all of which will match the database named 'employees'.
...
SHOW TABLES
lists all the base tables and views in the current database (or the one explicitly named using the IN
clause) with names matching the optional regular expression. Regular Wildcards in the regular expression can contain only be '*' for any characters character(s) or '|' for a choice. Examples are 'page_view', 'page_v*', '*view|page*', all which will match the 'page_view' table. Matching tables are listed in alphabetical order. It is not an error if there are no matching tables found in metastore. If no regular expression is given then all tables in the selected database are listed.
...
SHOW FUNCTIONS lists all the user defined and builtin functions matching the regular expression. To get all functions use ".*"
Show Indexes
Info | ||
---|---|---|
| ||
As of Hive 0.7 |
Code Block |
---|
SHOW [FORMATTED] (INDEX|INDEXES) ON table_with_index [(FROM|IN) db_name] |
SHOW INDEXES shows all of the indexes on a certain column, as well as information about them: index name, table name, names of the columns used as keys, index table name, index type, and comment. If the FORMATTED keyword is used, then column titles are printed for each column. Not available until 0.7 release.
Show Columns
Info | ||
---|---|---|
| ||
As of Hive 0.10 |
...
SHOW CREATE TABLE shows the CREATE TABLE statement that creates a given table, or the CREATE VIEW statement that creates a given view.
Describe Database
Info | ||
---|---|---|
| ||
As of Hive 0.7 |
Code Block |
---|
DESCRIBE DATABASE db_name |
DESCRIBE DATABASE will show shows the name of the database, its comment (if one has been set), and its root location on the filesystem. Not available until the 0.7 release.
Describe Table/Column
Code Block |
---|
DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name[DOT col_name ( [DOT field_name] | [DOT '$elem$'] | [DOT '$key$'] | [DOT '$value$'] )* ] |
...
Code Block |
---|
DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name PARTITION partition_spec
|
This statement lists metadata for a given partition. The output is similar to that of DESCRIBE. Presently, the column information associated with a particular partition is not used while preparing plans.
...