Installing Hive
You can install a stable release of Hive by downloading and unpacking a tarball, or you can download the source code and build Hive using Maven (release 3.6.3 and later).
Prerequisites
- Java 8.
- Maven 3.6.3
- Protobuf 2.5
- Hadoop 3.3.6 (As a preparation, configure it in single-node cluster, pseudo-distributed mode)
- Tez. The default is MapReduce but we will change the execution engine to Tez.
- Hive is commonly used in production Linux environment. Mac is a commonly used development environment. The instructions in this document are applicable to Linux and Mac.
Install the prerequisites
Java 8
Building Hive requires JDK 8 installed. Some notes in case you have ARM chipset (Apple M1 or later).
You will have to build protobuf 2.5 later. And it doesn't compile with ARM JDK. So we will install intel architecture's Java with brew and configure maven with this. It will enable us to compile protobuf.
JDK install on apple arm:
brew install homebrew/cask-versions/adoptopenjdk8 --cask brew untap adoptopenjdk/openjdk
Maven:
Just install maven and configure the JAVA_HOME properly.
Notes for arm: after a proper configuration, you should see something like this:
mvn -version Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f) Maven home: /Users/yourusername/programs/apache-maven-3.6.3 Java version: 1.8.0_292, vendor: AdoptOpenJDK, runtime: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre Default locale: en_HU, platform encoding: UTF-8 OS name: "mac os x", version: "10.16", arch: "x86_64", family: "mac"
As you can see, even if it is an arm processor, maven thinks the architecture is Intel based.
Protobuf
You have to download and compile protobuf. And also, install it into the local maven repository. Protobuf 2.5.0 is not ready for ARM. On this chipset, you will need to do some extra steps.
wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.bz2 tar -xvf protobuf-2.5.0.tar.bz2 cd protobuf-2.5.0 ./configure
On ARM, edit the src/google/protobuf/stubs/platform_macros.h and add arm to the part, processor architecture detection, after the last elif branch:
#elif defined(__arm64__) #define GOOGLE_PROTOBUF_ARCH_ARM 1 #define GOOGLE_PROTOBUF_ARCH_64_BIT 1
Now, you can compile and install protobuf:
make make check sudo make install
You can validate your install:
protoc --version
Hadoop
Firstly, move through the instructions on the official documentation, single-node, pseudo-distributed configuration: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation.
After that, set up HADOOP_HOME:
export HADOOP_HOME=/yourpathtohadoop/hadoop-3.3.6
Tez
Tez will require some additional steps. Hadoop uses a tez tarball but it expects it in other compressed directory structure than it is realeased. So we will extract the tarbal and compress again. And also, we will put the extracted jars into hdfs. After that we set the necessary environment variables.
Download tez, extract and re-compress the tar:
wget https://dlcdn.apache.org/tez/0.10.2/apache-tez-0.10.2-bin.tar.gz tar -xzvf apache-tez-0.10.2-bin.tar.gz cd apache-tez-0.10.2-bin tar zcvf ../apache-tez-0.10.2-bin.tar.gz * && cd ..
Add the necessary tez files to hdfs
$HADOOP_HOME/bin/hadoop fs -mkdir -p /apps/tez $HADOOP_HOME/bin/hadoop fs -put apache-tez-0.10.2-bin.tar.gz /apps/tez # copy the tarball $HADOOP_HOME/bin/hadoop fs -put apache-tez-0.10.2-bin /apps/tez # copy the whole folder
Set up TEZ_HOME environment variable
export TEZ_HOME=/Users/zsoltmiskolczi/work/hive/hive-from-tar/apache-tez-0.10.2-bin
Installing Hive from a Tarball
Start by downloading the most recent stable release of Hive from one of the Apache download mirrors (see Hive Releases).
Next you need to unpack the tarball. This will result in the creation of a subdirectory named hive-x.y.z
(where x.y.z
is the release number):
$ tar -xzvf hive-x.y.z.tar.gz
Set the environment variable HIVE_HOME
to point to the installation directory:
$ cd hive-x.y.z $ export HIVE_HOME={{pwd}}
Finally, add $HIVE_HOME/bin
to your PATH
:
$ export PATH=$HIVE_HOME/bin:$PATH
Installing from Source Code
Hive is available via Git at https://github.com/apache/hive. You can download it by running the following command.
$ git clone git@github.com:apache/hive.git
In case you want to get a specific release branch, like 4.0.0, you can run that command:
$ git clone -b branch-4.0 --single-branch git@github.com:apache/hive.git
To build Hive, execute the following command on the base directory:
$ mvn clean install -Pdist,itests,iceberg -DskipTests
It will create the subdirectory packaging/target/apache-hive-<release_string>-bin/apache-hive-<release_string>-bin/ with the following contents (example: packaging/target/apache-hive-4.0.0-beta-2-SNAPSHOT-bin/apache-hive-4.0.0-beta-2-SNAPSHOT-bin):
- bin/: directory containing all the shell scripts
- lib/: directory containing all required jar files
- conf/: directory with configuration files
- examples/: directory with sample input and query files
That directory should contain all the files necessary to run Hive. You can run it from there or copy it to a different location, if you prefer.
In order to run Hive, you must have Hadoop in your path or have defined the environment variable HADOOP_HOME with the Hadoop installation directory.
Moreover, we strongly advise users to create the HDFS directories /tmp and /user/hive/warehouse (also known as hive.metastore.warehouse.dir) and set them chmod g+w before tables are created in Hive.
Next Steps
You can begin using Hive as soon as it is installed, although you will probably want to configure it first.
Beeline CLI
The Hive home directory is packaging/target/apache-hive-<release_string>-bin/apache-hive-<release_string>-bin/.
HiveServer2 has a CLI called Beeline (see Beeline – New Command Line Shell). To use Beeline, execute the following command in the Hive home directory:
$ bin/beeline
Hive Metastore
Metadata is stored in an embedded Derby database whose disk storage location is determined by the Hive configuration variable named javax.jdo.option.ConnectionURL. By default, this location is ./metastore_db (see conf/hive-default.xml).
Using Derby in embedded mode allows at most one user at a time. To configure Derby to run in server mode, see Hive Using Derby in Server Mode.
To configure a database other than Derby for the Hive metastore, see Hive Metastore Administration.
Next Step: Configuring Hive.
HCatalog and WebHCat
HCatalog
If you install Hive from the binary tarball, the hcat
command is available in the hcatalog/bin
directory. However, most hcat
commands can be issued as hive
commands except for "hcat -g
" and "hcat -p
". Note that the hcat
command uses the -p
flag for permissions but hive
uses it to specify a port number. The HCatalog CLI is documented here and the Hive CLI is documented here.
HCatalog installation is documented here.
WebHCat (Templeton)
If you install Hive from the binary tarball, the WebHCat server command webhcat_server.sh
is in the hcatalog/webhcat/svr/src/main/bin/webhcat_server.sh directory.
WebHCat installation is documented here.