Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add overview and links to HCat & WebHCat DDL; add PARTITION keyword to DESCRIBE syntax; misc. edits

Hive Data Definition Language

Table of Contents

Overview

HiveQL DDL statements are documented here, including:

  • CREATE DATABASE, SCHEMA, TABLE, VIEW, FUNCTION, INDEX
  • DROP DATABASE, SCHEMA, TABLE, VIEW, INDEX
  • TRUNCATE TABLE
  • ALTER DATABASE, SCHEMA, TABLE, VIEW
  • MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
  • Wiki Markup
    SHOW DATABASES, SCHEMAS, TABLES, TBLPROPERTIES, PARTITIONS, FUNCTIONS, INDEX\[ES\], COLUMNS, CREATE TABLE
  • DESCRIBE DATABASE, table_name, view_name

PARTITION statements are options of TABLE statements, except for SHOW PARTITIONS.

For information about DDL in HCatalog and WebHCat, see HCatalog DDL and WebHCat DDL Resources.

Create/Drop/Alter Database

...

Code Block
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
  [COMMENT database_comment]
  [LOCATION hdfs_path]
  [WITH DBPROPERTIES (property_name=property_value, ...)];

The use uses of SCHEMA and DATABASE are interchangeable – they mean the same thing.

...

Use STORED BY to create a non-native table, for example in HBase. See StorageHandlers for more information on this option.

Partitioned

...

Tables

Partitioned tables can be created using the PARTITIONED BY clause. A table can have one or more partition columns and a separate data directory is created for each distinct value combination in the partition columns. Further, tables or partitions can be bucketed using CLUSTERED BY columns, and data can be sorted within that bucket via SORT BY columns. This can improve performance on certain kinds of queries.

...

In contrast, the statement above creates a new empty_key_value_store table whose definition exactly matches the existing key_value_store in all particulars other than table name. The new table contains no rows.

Inserting Data

...

into Bucketed Tables

The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.

There is also an example of creating and populating bucketed tables.

Skewed

...

Tables

Info
titleVersion information

As of Hive 0.10.0 (HIVE-3072 & HIVE-3073)

Info
titleDesign document

Read the design document for more information

...

Removes all rows from a table or partition(s). Currently target table should be native/managed table or exception will be thrown.
User can specify partial partition_spec for truncating multiple partitions at once and omitting partition_spec will truncate all partitions in the table.

...

An error is thrown if the partition_spec for the table already exists. You can use IF NOT EXISTS to skip the error.

Recover

...

Partitions (MSCK REPAIR TABLE)

Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS (say by using hadoop fs -put command), the metastore (and hence Hive) will not be aware of these partitions unless the user runs ALTER TABLE table_name ADD PARTITION commands on each of the newly added partitions.

...

Alter Table Rename Partition

Info
titleVersion information

As of Hive 0.9

Code Block
ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;

...

This statement lets you change the value of a partition column.

Create/Drop View

...

Info
titleVersion information

View support is only available starting in Hive 0.6.

Create View

Code Block
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], ...) ]
[COMMENT view_comment]
[TBLPROPERTIES (property_name = property_value, ...)]
AS SELECT ...

...

In Hive 0.70 or later, DROP returns an error if the function doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.

Create/Drop Index

...

Info
titleVersion information

As of Hive 0.7

...

Create Index

Code Block
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS index_type
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[
   [ ROW FORMAT ...] STORED AS ...
   | STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"]

CREATE INDEX creates an index on a table using the given list of columns as keys. See http://wiki.apache.org/hadoop/Hive/IndexDev#CREATE_INDEXCREATE INDEX in the Indexes design document.

Drop Index

Code Block
DROP INDEX [IF EXISTS] index_name ON table_name

DROP INDEX drops the index, as well as deleting the index table.

In Hive 0.70 7.0 or later, DROP returns an error if the index doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.

...

SHOW DATABASES lists all of the databases defined in the metastore. The optional LIKE clause allows the list of databases to be filtered using a regular expression. The Wildcards in the regular expression may can only contain be '*' for any character(s) or '|' for a choice. Examples are 'employees', 'emp*', "'emp*|*ees', all of which will match the database named 'employees'.

...

SHOW TABLES lists all the base tables and views in the current database (or the one explicitly named using the IN clause) with names matching the optional regular expression. Regular Wildcards in the regular expression can contain only be '*' for any characters character(s) or '|' for a choice. Examples are 'page_view', 'page_v*', '*view|page*', all which will match the 'page_view' table. Matching tables are listed in alphabetical order. It is not an error if there are no matching tables found in metastore. If no regular expression is given then all tables in the selected database are listed.

...

SHOW FUNCTIONS lists all the user defined and builtin functions matching the regular expression. To get all functions use ".*"

Show Indexes

Info
titleVersion information

As of Hive 0.7

Code Block
SHOW [FORMATTED] (INDEX|INDEXES) ON table_with_index [(FROM|IN) db_name]

SHOW INDEXES shows all of the indexes on a certain column, as well as information about them: index name, table name, names of the columns used as keys, index table name, index type, and comment. If the FORMATTED keyword is used, then column titles are printed for each column. Not available until 0.7 release.

Show Columns

Info
titleVersion information

As of Hive 0.10

...

SHOW CREATE TABLE shows the CREATE TABLE statement that creates a given table, or the CREATE VIEW statement that creates a given view.

Describe Database

Info
titleVersion information

As of Hive 0.7

Code Block
DESCRIBE DATABASE db_name

DESCRIBE DATABASE will show shows the name of the database, its comment (if one has been set), and its root location on the filesystem. Not available until the 0.7 release.

Describe Table/Column

Code Block
DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name[DOT col_name ( [DOT field_name] | [DOT '$elem$'] | [DOT '$key$'] | [DOT '$value$'] )* ]

...

Code Block
DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name PARTITION partition_spec

This statement lists metadata for a given partition. The output is similar to that of DESCRIBE. Presently, the column information associated with a particular partition is not used while preparing plans.

...