...
Code Block |
---|
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"]
|
For the details of the various clauses such as ROW FORMAT, see LanguageManual DDL#Create_Create Table.
By default, index partitioning matches the partitioning of the base table. The PARTITIONED BY clause may be used to specify a subset of the table's partitioning columns (this column list may be empty to indicate that the index spans all partitions of the table). For example, a table may be partitioned by date+region even though the index is partitioned by date alone (each index partition spanning all regions).
...
The diagram below shows the new metastore schema with index support:
http://issues.apache.org/jira/secure/attachment/12449601/idx2.png
The new IDXS table in the metastore schema contains one entry per index created. It has two relationships with the TBLS table:
...
Code Block |
---|
CREATE TABLE t(i int, j int);
CREATE INDEX x ON TABLE t(j)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';
|
...
Code Block |
---|
CREATE INDEX x ON TABLE t(j)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
IN TABLE t_x;
|
...
The corresponding Java inerface is defined below, together with a companion abstract base class which handlers should extend.
Code Block |
---|
package org.apache.hadoop.hive.ql.metadata; import java.util.List; import org.apache.hadoop.conf.Configurable; import org.apache.hadoop.hive.ql.plan.api.Task; /** * HiveIndexHandler defines a pluggable interface for adding new * index handlers to Hive. */ public interface HiveIndexHandler extends Configurable { /** * Determines whether this handler implements indexes by creating * an index table. * * @return true if index creation implies creation of an index table in Hive; * false if the index representation is not stored in a Hive table */ boolean usesIndexTable(); /** * Requests that the handler validate an index definition and * fill in additional information about its stored representation. * * @param baseTable the definition of the table being indexed * * @param index the definition of the index being created * * @param indexTable a partial definition of the index table to be used for * storing the index representation, or null if usesIndexTable() returns * false; the handler can augment the index's storage descriptor * (e.g. with information about input/output format) * and/or the index table's definition (typically with additional * columns containing the index representation, e.g. pointers into HDFS) * * @throw HiveException if the index definition is invalid with * respect to either the base table or the supplied index table definition */ void analyzeIndexDefinition( org.apache.hadoop.hive.metastore.api.Table baseTable, org.apache.hadoop.hive.metastore.api.Index index, org.apache.hadoop.hive.metastore.api.Table indexTable) throws HiveException; /** * Requests that the handler generate a plan for building the index; * the plan should read the base table and write out the index representation. * * @param baseTable the definition of the table being indexed * * @param index the definition of the index * * @param partitions a list of specific partitions of the base * table for which the index should be built, or null if * an index for the entire table should be rebuilt * * @param indexTable the definition of the index table, or * null if usesIndexTable() returns null * * @return list of tasks to be executed in parallel for building * the index * * @throw HiveException if plan generation fails */ List<Task<?>> generateIndexBuildTaskList( org.apache.hadoop.hive.metastore.api.Table baseTable, org.apache.hadoop.hive.metastore.api.Index index, List<org.apache.hadoop.hive.metastore.api.Partition> partitions, org.apache.hadoop.hive.metastore.api.Table indexTable) throws HiveException; } /** * Abstract base class for index handlers. This is provided as insulation * so that as HiveIndexHandler evolves, default implementations of new * methods can be added here in order to avoid breaking existing * plugin implementations. */ public abstract class AbstractIndexHandler implements HiveIndexHandler { } |
...