Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Overview

There are presently three ways to issue HCatalog DDL commands:

  1. Command line interface
  2. REST APIs (upcoming)
  3. HiveMetaStore Client

Presently, java developers go through the Hive meta store (HMS) client interface to issue HCatalog DDl commands. Though the HMS client interface is public, it is not intended for public users. According to the hive user mailing list, the HMS client is not a public API and is subject to change in the future. So, it will be a good idea to have a java APIs in HCatalog which will provide a protect users from the changes made to the hive meta store client. Also, the under the covers either the Rest APIS or the hive metastore client can be used to provide end users with the required data.

Design

Image Added

New Classes

HCatClient

The HCatClient is an abstract class containing all the APIs permitted HCatalog DDL commands. The implementation class will be provided as a configuration property, which will be used by the
"create" method. In this way, the implementation details will be masked to the users.

Code Block

public abstract class HCatClient {

    /**
     * Creates an instance of HCatClient.
     *
     * @param conf An instance of configuration.
     * @return An instance of HCatClient.
     * @throws IOException
     */
    public static HCatClient create(Configuration conf) throws IOException{
      HCatClient client = HCatUtil.getHCatClient(conf);
        if(client != null){
            client.initialize(conf);
        }
        return client;
    }

    abstract void initialize(Configuration conf

– This document is a work in progress.

Overview

Templeton provides a REST-like web API for HCatalog and related Hadoop components. Developers can make HTTP requests to the Templeton web server to execute HCatalog DDL commands. With the REST APIs in place for HCatalog DDL commands, it is desirable to have a JAVA APIs in HCAT which can help end users to execute DDL commands without using CLI.

Design

New Classes:

HCatClient

The HCatClient is an interface containing all the APIs permitted HCatalog DDL commands.

Code Block

 package org.apache.hcatalog.api;

import java.util.List;

import org.apache.hcatalog.common.HCatException;
import org.apache.hcatalog.mapreduce.HCatDatabaseInfo;
import org.apache.hcatalog.mapreduce.HCatPartitionInfo;
import org.apache.hcatalog.mapreduce.HCatTableInfo;

/**
 * The Interface HCatClient containing APIs for HCatalog DDL commands.
 */
public interface HCatClient {

    /**
     * Gets the database like.
     *
     * @param regex The regular expression. Providing "*" would retrieve all the names
     *              of the databases.
     * @return The list of all the database names.
     * @throws HCatException
     */
    public List<String> getDatabaseLike(String regex) throws HCatException;

    /**
     * Gets the database.
     *
     * @param dbName The name of the database.
     * @return An instance of HCatDatabaseInfo.
     * @throws HCatException
     */
    public HCatDatabaseInfo getDatabase(String dbName) throws HCatException;

       /**
     * Creates the database.
     *
     *Get @paramall dbInfoexisting databases Anthat instancematch ofthe HCatCreateDBDesc.
    given
     * @return true, if successful
     * @throws HCatException
     */
    public boolean createDatabase(HCatCreateDBDesc dbInfo)
            throws HCatException;

    /**
     * Deletes a database.
     *
     * @param dbName The name of the database to delete.
     * @param ifExists Hive returns an error if the database specified does not exist,
     *                 unless ifExists is set to true.
     * @param mode This is set to either "restrict" or "cascade". Restrict will
     *             remove the schema if all the tables are empty. Cascade removes
     *             everything including data and definitions.
     * @param userGroup The user group to use
     * @param permissions The permissions string to use. The format is "rwxrw-r-x".
     * @return true, if successful
     * @throws HCatException
     */
    public boolean deleteDatabase(String dbName, boolean ifExists, String mode,
            String userGroup, String permissions pattern. The matching occurs as per Java regular expressions
     *
     * @param databasePattern
     *          java re pattern
     * @return list of database names
     * @throws HCatException
     */
    public abstract List<String> listDatabaseNamesByPattern(String pattern) throws HCatException;

    /**
     * Gets the database.
     *
     * @param dbName The name of the database.
     * @return An instance of HCatDatabaseInfo.
     * @throws HCatException
     */
    public abstract HCatDatabase getDatabase(String dbName) throws HCatException;

       /**
         * GetsCreates the tablesdatabase.
 like a pattern specified.
     *
         * @param dbNamedbInfo TheAn nameinstance of the databaseHCatCreateDBDesc.
         * @param@throws regexHCatException
 The regular expression. Providing "*"/
 would retrieve all thepublic names
     *              of  the table.
     * @return A list of all table names matching the specified pattern.
     * @throws HCatException
     */
    public List<String> getTablesLike(String dbName, String regex)
            throws HCatException;

    /**
     * Gets the table.
     *
    abstract void createDatabase(HCatCreateDBDesc dbInfo)
            throws HCatException;

    /**
     * Drops a database.
     *
     * @param dbName The name of the database to delete.
         * @param ifExists Hive tableNamereturns Thean nameerror ofif the table.
     * @return An instance of HCatTableInfo.
     * @throws HCatException
     */
    public HCatTableInfo getTable(String dbName, String tableName)
            throws HCatException;

    /**
     * Creates the table.
     *
     * @param createTableDesc An instance of HCatCreateTableDesc class.
     * @return true, if successful.
     * @throws HCatException the h cat exception
     */
    public boolean createTable(HCatCreateTableDesc createTableDesc)
            throws HCatException;

    /**
     * Creates the table like an existing table.
     *
     * @param dbName The name of the database.
     * @param existingTblName The name of the existing table.
     * @param newTableName The name of the new table.
     * @param ifExists the if exists
     * @param isExternal Set to "true", if table has be created at a different
     *                   location other than default.
     * @param location The location for the table.
     * @return true, if successful
     * @throws HCatException
     */
    public boolean createTableLike(String dbName, String existingTblName,
            String newTableName, boolean ifExists, boolean isExternal,
            String location) throws HCatException;

    /**
     * Delete a table.
     *
     * @param dbName The name of the database.
     * @param tableName The name of the table.
     * @param ifExists Hive returns an error if the database specified does not exist,
     *                 unless ifExists is set to true.
     * @param userGroup The user group to use.
     * @param permissions The permissions string to use. The format is "rwxrw-r-x".
     * @return true, if successful
     * @throws HCatException
     */
    public boolean deleteTable database specified does not exist,
     *                 unless ifExists is set to true.
     * @param mode This is set to either "restrict" or "cascade". Restrict will
     *             remove the schema if all the tables are empty. Cascade removes
     *             everything including data and definitions.
     * @throws HCatException
     */
    public abstract void dropDatabase(String dbName, boolean ifExists, String mode) throws HCatException;

    /**
     * Returns all existing tables from the specified database which match the given
     * pattern. The matching occurs as per Java regular expressions.
     * @param dbName
     * @param tablePattern
     * @return list of table names
     * @throws HCatException
     */
    public abstract List<String> listTableNamesByPattern(String dbName, String tablePattern)
            throws HCatException;

    /**
     * Gets the table.
     *
     * @param dbName The name of the database.
     * @param tableName The name of the table.
     * @return An instance of HCatTableInfo.
     * @throws HCatException
     */
    public abstract HCatTable getTable(String dbName, String tableName,)
            boolean ifExists, String userGroup, String permissions)
                      throws HCatException;

       /**
         * RenamesCreates athe table.
         *
         * @param dbNamecreateTableDesc TheAn nameinstance of theHCatCreateTableDesc databaseclass.
      * @param oldName The* name@throws ofHCatException the table to be renamed.
     * @param newName The new name of the table.
     * @param userGroup The user group to use.
     * @param permissions The permissions string to use. The format is "rwxrw-r-x".
     * @return true, if successful
     * @throws HCatException
     */
    public boolean renameTable(String dbName, String oldName, String newName,
            String userGroup, String permissions) throws HCatException;

    /**
     * Gets all the partitions.
     *
     * @param dbNameh cat exception
     */
    public abstract void createTable(HCatCreateTableDesc createTableDesc)
            throws HCatException;

    /**
     * Creates the table like an existing table.
     *
     * @param dbName The name of the database.
     * @param existingTblName The name of the existing databasetable.
         * @param tblNamenewTableName The name of the new table.
     * @return A list of partition names.
     * @throws@param HCatExceptionifNotExists theIf htrue, catthen exception
     */
    public List<HCatPartitionInfo> getPartitions(String dbName, String tblName)
            throws HCatException;

    /**
     * Gets the partition.
     *
     * @param dbName The database name.
     * @param tableName The table name.
     * @param partitionName The partition name, Comma separated list of col_name='value'.
     * @return An instance of HCatPartitionInfo.
    error related to already table existing is skipped.
     * @param isExternal Set to "true", if table has be created at a different
     *                   location other than default.
     * @param location The location for the table.
     * @throws HCatException
         */
       public abstract HCatPartitionInfovoid getPartitioncreateTableLike(String dbName, String tableNameexistingTblName,
             String partitionName) throws HCatException;

    /**
     * Adds the partition.
     *
     *String @paramnewTableName, partInfoboolean AnifNotExists, instance of HCatAddPartitionDesc.
     * @return true, if successful
     * @throws HCatException the h cat exception
     */
    public boolean addPartition(HCatAddPartitionDesc partInfoboolean isExternal,
            String location) throws HCatException;

       /**
         * DeletesDrop partitiontable.
         *
         * @param dbName The database name.
    name of the database.
     * @param tableName The name of the table name.
     * @param partitionName The partition name, Comma separated list of col_name='value'.
     * @param ifExists Hive returns an error if the partitiondatabase specified does not exist,
 unless ifExists is set to* true.
     * @param userGroup The user group to use.
     * @param permissions The permissions string to use. The format is "rwxrw-r-x".
     * @return true, if successful
     unless ifExists is set to true.
     * @throws HCatException
         */
       public booleanabstract void deletePartitiondropTable(String dbName, String tableName,
             String partitionName, boolean ifExists, String userGroup,
            String permissions) throws   boolean ifExists) throws HCatException;

       /**
         * ListRenames partitionsa by filtertable.
         *
         * @param dbName The name of the database name.
     * @param tblName The table name.
     * @param filteroldName The filter string,
     *    for example "part1 = \"p1_abc\" and part2 <= "\p2_test\"". Filtering can
     *    be done only on string partition keys.
     * @return list of partitions
     name of the table to be renamed.
     * @param newName The new name of the table.
     * @throws HCatException
 the h cat exception
     */
       public abstract List<HCatPartitionInfo>void listPartitionsByFilterrenameTable(String dbName, String tblNameoldName,
            String filternewName) throws HCatException;

}
HCatTempletonClient

This class implements HCatClient interface.

HCatTempletonDriver

This class implements Hive's CommandProcessorResponse interface.

Code Block

public interface CommandProcessor {
  public void init();

  public CommandProcessorResponse run(String command) throws CommandNeedRetryException;
}

The "run" method will consume the curl command as an input parameter and return the response.

HCatCommandDesc

This is an abstract class that helps in validating user input, building valid command descriptors and queries.

Code Block

/**
 * The Class HCatCommandDesc contains methods which help in validating,
 * building command descriptors and queries.
 */
public abstract class HCatCommandDesc{

    public abstract void validateCommandDesc() throws HCatException;
    abstract String buildQuery() throws HCatException;
    abstract boolean isValidationComplete();

}
HCatCreateTableDesc

This class is a sub class of HCatCommandDesc and will be used by the users to create descriptor and validate it for the "create table" command.

HCatCreateDBDesc

This class is a sub class of HCatCommandDesc and will be used by the users to create descriptos and validate it for the "create database" command.

HCatAddPartitionDesc

This class is a sub class of HCatCommandDesc and will be used by the users to create descriptos and validate it for the "add partition" command.

HCatDBInfo

Modification to Existing Classes:

HCatTableInfo
PartInfo

Usage

...

    /**
     * Gets all the partitions.
     *
     * @param dbName The name of the database.
     * @param tblName The name of the table.
     * @return A list of partition names.
     * @throws HCatException the h cat exception
     */
    public abstract List<HCatPartition> getPartitions(String dbName, String tblName)
            throws HCatException;

    /**
     * Gets the partition.
     *
     * @param dbName The database name.
     * @param tableName The table name.
     * @param partitionName The partition name, Comma separated list of col_name='value'.
     * @return An instance of HCatPartitionInfo.
     * @throws HCatException
     */
    public abstract HCatPartition getPartition(String dbName, String tableName,
            String partitionName) throws HCatException;

    /**
     * Adds the partition.
     *
     * @param partInfo An instance of HCatAddPartitionDesc.
     * @throws HCatException the h cat exception
     */
    public abstract void addPartition(HCatAddPartitionDesc partInfo) throws HCatException;

    /**
     * Drops partition.
     *
     * @param dbName The database name.
     * @param tableName The table name.
     * @param partitionName The partition name, Comma separated list of col_name='value'.
     * @param ifExists Hive returns an error if the partition specified does not exist, unless ifExists is set to true.
     * @throws HCatException
     */
    public abstract void dropPartition(String dbName, String tableName,
            String partitionName, boolean ifExists) throws HCatException;

    /**
     * List partitions by filter.
     *
     * @param dbName The database name.
     * @param tblName The table name.
     * @param filter The filter string,
     *    for example "part1 = \"p1_abc\" and part2 <= "\p2_test\"". Filtering can
     *    be done only on string partition keys.
     * @return list of partitions
     * @throws HCatException the h cat exception
     */
    public abstract List<HCatPartition> listPartitionsByFilter(String dbName, String tblName,
            String filter) throws HCatException;

    /**
     * Mark partition for event.
     *
     * @param dbName The database name.
     * @param tblName The table name.
     * @param partKVs the part k vs
     * @param eventType the event type
     * @throws HCatException the h cat exception
     */
    public abstract void markPartitionForEvent(String dbName, String tblName,
            Map<String, String> partKVs, PartitionEventType eventType)
            throws HCatException;

    /**
     * Checks if is partition marked for event.
     *
     * @param dbName the db name
     * @param tblName the tbl name
     * @param partKVs the part k vs
     * @param eventType the event type
     * @return true, if is partition marked for event
     * @throws HCatException the h cat exception
     */
    public abstract boolean isPartitionMarkedForEvent(String dbName, String tblName,
            Map<String, String> partKVs, PartitionEventType eventType)
            throws HCatException;

    /**
     * Gets the delegation token.
     *
     * @param owner the owner
     * @param renewerKerberosPrincipalName the renewer kerberos principal name
     * @return the delegation token
     * @throws HCatException the h cat exception
     */
    public abstract String getDelegationToken(String owner, String renewerKerberosPrincipalName) throws
        HCatException;

    /**
     * Renew delegation token.
     *
     * @param tokenStrForm the token str form
     * @return the long
     * @throws HCatException the h cat exception
     */
    public abstract long renewDelegationToken(String tokenStrForm) throws HCatException;

    /**
     * Cancel delegation token.
     *
     * @param tokenStrForm the token str form
     * @throws HCatException the h cat exception
     */
    public abstract void cancelDelegationToken(String tokenStrForm) throws HCatException;

    /**
     * Close the hcatalog client.
     *
     * @throws HCatException the h cat exception
     */
    public abstract void close() throws HCatException;
HCatCreateTableDesc

This class is a sub class of HCatCommandDesc and will be used by the users to create descriptor and validate it for the "create table" command.
Image Added

HCatCreateDBDesc

This class is a sub class of HCatCommandDesc and will be used by the users to create descriptors and validate it for the "create database" command.

!createdb.png|

HCatAddPartitionDesc

This class is a sub class of HCatCommandDesc and will be used by the users to create descriptors and validate it for the "add partition" command.

 Image Added

HCatTable

This class encapsulates the table information returned the HCatClient implementation class and provides a uniform view to the user.

Image Added

HCatDatabase

This class encapsulates the database information returned the HCatClient implementation class and provides a uniform view to the user.

Image Added

HCatPartition

This class encapsulates the partition information returned the HCatClient implementation class and provides a uniform view to the user.

Image Added

Usage

Code Block

 Configuration config = new Configuration();
 config.add("hive-site.xml");
HCatClient client = HCatClient.create(config);
ArrayList<HCatFieldSchema> cols = new ArrayList<HCatFieldSchema>();
cols.add(new HCatFieldSchema("id", Type.INT, "id columns"));
cols.add(new HCatFieldSchema("value", Type.STRING, "id columns"));
HCatCreateTableDesc tableDesc = HCatCreateTableDesc.create(db, "testtable", cols).fileFormat("rcfile").build();
client.createTable(tableDesc);

Discussion Topics