This page aims to catalogue and describe the various public facing APIs exposed by Hive in order to inform developers wishing to integrate their applications and frameworks with the Hive ecosystem. To date the following APIs have been identified in the Hive project that are either considered public, or widely used in the public domain:
- HCatClient (Java)
- HCat storage handlers (Java)
- HCat CLI (Command line)
- Metastore (Java)
- Hive (Java)
- Driver (Java)
- WebHCat (REST)
- Streaming Ingest (Java)
- Streaming Mutation (Java)
- hive-jdbc (JDBC)
HCatClient
This is a Java API that presents a number of DDL type operations, however it is not as comprehensive as the Metastore API. The HCatClient was intended to the Java based the entry point to WebHCat HCatalog API although this was never realised. Currently HCatClientHMSImpl
is the only concrete implementation of the API; it integrates directly with the Metastore using the Metastore API and does not utilise WebHCat whatsoever despite being packaged inside the WebHCat project. The HCatClientHMSImpl
was originally provided as a reference implementation but it has over time gained traction a public client. Anecdotally, it is now the officially preferred API for issuing DDL type operations from external programs and feature contributions are encouraged. There is some minimal documentation on the wiki in the form of a design document describing the interface but not the implementation.
HCatalog storage handlers
This is well documented on the wiki.
HCat CLI
This is well documented on the wiki.
Metastore
A Thrift based API with Java bindings, described by the IMetaStoreClient
interface. The API decouples the metastore storage layer from other Hive internals. Because Hive itself uses this internally, it is required to implement a comprehensive feature set which makes it attractive developers who might find the other APIs lacking. It was not originally intended to be a public API although it became public in version 1.0.0 (HIVE-3280) and it has been proposed that it be documented more fully (HIVE-9363). Anecdotally, its use outside of Hive project is not currently recommended.
Hive
Refers to the org.apache.hadoop.hive.ql.metadata.Hive
class. Appears to be a distinct concrete implementation of a variation of the metastore API. Delegates to the metastore API but does not directly extend/implement it.
Driver
Refers to the org.apache.hadoop.hive.ql.Driver
class.
WebHCat
WebHCat is a REST API for HCatalog. This is well documented on the wiki.
Streaming Data Ingest
A Java API focused on the writing of continuous streams of data into transactional tables using Hive’s ACID feature. New data is inserted into tables using small batches and short-lived transactions. Documented on the wiki and has package level Javadoc. Introduced in Hive version 0.13.0 (HIVE-5687).
Streaming Mutation
A Java API focused on mutating (insert/update/delete) records into transactional tables using Hive’s ACID feature. Large volumes of mutations are applied atomically in a single long-lived transaction. Documented with package level Javadoc. Scheduled for release in Hive version 2.0.0 (HIVE-10165).