Table of Contents |
---|
Installing, configuring and running Hive
You can install a stable release of Hive by downloading and unpacking a tarball, or you can download the source code and build Hive using Maven (release 3.6.3 and later).
Prerequisites
- Java 8.
- Maven 3.6.3
- Protobuf 2.5
- Hadoop 3.3.6 (As a preparation, configure it in single-node cluster, pseudo-distributed mode)
- Tez. The default is MapReduce but we will change the execution engine to Tez.
- Hive is commonly used in production Linux environment. Mac is a commonly used development environment. The instructions in this document are applicable to Linux and Mac.
...
You will have to build protobuf 2.5 later. And it doesn't compile with ARM JDK. So we will install intel architecture's Java with brew and configure maven with this. It will enable us to compile protobuf.
JDK install on apple armARM:
Code Block | ||
---|---|---|
| ||
brew install homebrew/cask-versions/adoptopenjdk8 --cask brew untap adoptopenjdk/openjdk |
...
Tez will require some additional steps. Hadoop uses a tez tarball but it expects it in other compressed directory structure than it is realeasedreleased. So we will extract the tarbal tarball and compress again. And also, we will put the extracted jars into hdfs. After that we set the necessary environment variables.
...
Code Block | ||
---|---|---|
| ||
export TEZ_HOME=/Users/zsoltmiskolczi/work/hive/hive-from-tar/yourpathtotez/apache-tez-0.10.2-bin export HADOOP_CLASSPATH=$TEZ_HOME/*:$TEZ_HOME/conf |
...
Code Block | ||
---|---|---|
| ||
<configuration> <property> <name>tez.lib.uris</name> <value>hdfs://localhost:9000/apps/tez/apache-tez-0.10.2-bin.tar.gz,hdfs://localhost:9000/apps/tez/apache-tez-0.10.2-bin/lib,hdfs://localhost:9000/apps/tez/apache-tez-0.10.2-bin</value> </property> <property> <name>tez.use.cluster.hadoop-libs</name> </configuration> |
Extra hadoop configurations to make everything working
Modify $HADOOP_HOME/etc/hadoop/core-site.xml
Code Block | ||
---|---|---|
| ||
<configuration> <property> <value>true< <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> < <property> <name>hadoop.proxyuser.yourusername.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.yourusername.hosts</name> <value>*</value> </property> </configuration> |
Modify $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Code Block | ||
---|---|---|
| ||
# JAVA_HOME
export JAVA_HOME=/yourpathtojavahome/javahome
# tez
export TEZ_CONF_DIR=/yourpathtotezconf/conf
export TEZ_JARS=/yourpathtotez/apache-tez-0.10.2-bin
export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*:${HADOOP_CLASSPATH}:
${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS} |
Modify $HADOOP_HOME/etc/hadoop/mapred-site.xml
Code Block | ||
---|---|---|
| ||
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration> |
Modify $HADOOP_HOME/etc/hadoop/yarn-site.xml
Code Block | ||
---|---|---|
| ||
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
</configuration> |
Installing Hive from a Tarball
...
No Format |
---|
cd apache-hive-4.0.0-beta-1-bin export HIVE_HOME=/Users/zsoltmiskolczi/work/hive/hive-from-tar/yourpathtohive/apache-hive-4.0.0-beta-1-bin |
...
Code Block | ||
---|---|---|
| ||
<configuration> <property> <name>hive.tez.container.size</name> <value>1024</value> </property> <property> <name>hive.metastore.warehouse.external.dir</name> <value>/Users/zsoltmiskolczi/work/hive/hive-from-tar/yourpathtowarehousedirectory/warehouse</value> </property> <property> <name>hive.execution.engine</name> <value>tez</value> </property> <property> <name>tez.lib.uris</name> <value>hdfs://localhost:9000/apps/tez/apache-tez-0.10.2-bin.tar.gz,hdfs://localhost:9000/apps/tez/apache-tez-0.10.2-bin/lib,hdfs://localhost:9000/apps/tez/apache-tez-0.10.2-bin</value> </property> <property> <name>tez.configuration</name> <value>/Users/zsoltmiskolczi/work/hive/hive-from-tar<value>/yourpathtotez/apache-tez-0.10.2-bin/conf/tez-site.xml</value> </property> <property> <name>tez.use.cluster.hadoop-libs</name> <value>true</value> </property> </configuration> |
...
Code Block | ||
---|---|---|
| ||
$HIVE_HOME/bin/schematool -dbType derby -initSchema --verbose |
Run HiveServer2
Code Block | ||
---|---|---|
| ||
$HIVE_HOME/bin/hiveserver2 & |
Run beeline:
Code Block | ||
---|---|---|
| ||
bin/beeline -u 'jdbc:hive2://localhost:10000/' -n invisbleprogrammer |
Installing from Source Code
//localhost:10000/' -n yourusername |
As a test, create a table insert some value
Code Block | ||
---|---|---|
| ||
create table test (message string);
insert into test values ('Hello, from Hive!'); |
Installing from Source Code
Configuring is the same as when we do it from tarball. The only difference is that we have to build Hive for ourself and we will find the compiled binaries in a different directory.
Hive is available via Git at https://github.com/apache/hive. You can download it by running the following command.
Code Block | ||
---|---|---|
| ||
$ git clone git@github.com:apache/hive.git |
In case you want to get a specific release branch, like 4.0.0, you can run that command:
Code Block | ||
---|---|---|
| ||
$ git clone -b branch-4.0 --single-branch git@github.com:apache/hive.git |
...
It will create the subdirectory packaging/target/apache-hive-<release_string>-bin/apache-hive-<release_string>-bin/ with the following contents (example: packaging/target/apache-hive-4.0.0-beta-2-SNAPSHOT-bin/apache-hive-4.0.0-beta-2-SNAPSHOT-bin)-<release_string>-bin/. That will be your HIVE_HOME directory.
It has a content like:
- bin/: directory containing all the shell scripts
- lib/: directory containing all required jar files
- conf/: directory with configuration files
- examples/: directory with sample input and query files
That directory should contain all the files necessary to run Hive. You can run it from there or copy it to a different location, if you prefer.
In order to run Hive, you must have Hadoop in your path or have defined the environment variable HADOOP_HOME with the Hadoop installation directory.
necessary to run Hive. You can run it from there or copy it to a different location, if you prefer.
From now, you can follow the steps described in the section Installing Hive from a TarballMoreover, we strongly advise users to create the HDFS directories /tmp and /user/hive/warehouse (also known as hive.metastore.warehouse.dir) and set them chmod g+w before tables are created in Hive.
Next Steps
You can begin using Hive as soon as it is installed, although you will probably want to configure it firstit should be work on you computer. There are some extra information in the following sections.
Beeline CLI
The Hive home directory is packaging/target/apache-hive-<release_string>-bin/apache-hive-<release_string>-bin/.
HiveServer2 has a CLI called Beeline (see Beeline – New Command Line Shell). To use Beeline, execute the following command in the Hive home directory:
Code Block |
---|
$ bin/beeline |
Hive Metastore
Hive Metastore
Metadata is stored in a relational database. In our example (and as a default) it is a Derby database. By default, it's Metadata is stored in an embedded Derby database whose disk storage location is determined by the Hive configuration variable named javax.jdo.option.ConnectionURL. By default, this location is ./metastore_db. (see See conf/hive-default.xml). You can change it by modifying the configuration variable javax.jdo.option.ConnectionURL.
Using Derby in embedded mode allows at most one user at a time. To configure Derby to run in server mode, see Hive Using Derby in Server Mode.
...
Next Step: Configuring Hive.
HCatalog and WebHCat
HCatalog
If you install Hive from the binary tarball, the hcat
command is available in the hcatalog/bin
directory. However, most hcat
commands can be issued as hive
commands except for "hcat -g
" and "hcat -p
". Note that the hcat
command uses the -p
flag for permissions but hive
uses it to specify a port number. The HCatalog CLI is documented here and the Hive CLI is documented here.
HCatalog installation is documented here.
WebHCat (Templeton)
If you install Hive from the binary tarball, the WebHCat server command webhcat_server.sh
is in the hcatalog/webhcat/svr/src/main/bin/webhcat_server.sh directory.
...