This page is out of date, but retained in case people want/need to try setting up Impala from scratch without the automated Chef script.
These instructions are installing the preqrequisite prerequisite packages and configuration for Impala. Currently we have guides for building on Ubuntu 14.04 and CentOs 6.5.###
Java
Download the Oracle Java 7 JDK.
On Ubuntu 14.04 this can be done with the following commands:
Code Block |
---|
...
sudo add-apt-repository ppa:webupd8team/java -y |
...
sudo apt-get update -y |
...
# Will have to agree to License |
...
sudo apt-get install oracle-jdk7-installer -y |
On CentOs CentOS 6.5, [this page has a good guide.](https://www.digitalocean.com/community/tutorials/how-to-install-java-on-centos-and-fedora)
---###
Required packages
On Ubuntu 14.04
Code Block |
---|
...
sudo apt-get install git build-essential cmake bison flex pkg-config libsasl2-dev autoconf automake libtool maven subversion doxygen libbz2-dev zlib1g-dev python-pip python-setuptools python-dev libssl-dev libboost-all-dev postgresql liblzo2-dev lzop -y |
...
sudo pip install allpairs pytest pytest-xdist paramiko texttable prettytable sqlparse psutil==0.7.1 pywebhdfs gitpython jenkinsapi boto3 |
On CentOs CentOS 6.5
Code Block |
---|
...
sudo yum groupinstall "Development Tools" |
...
sudo yum -y install git ant libevent-devel automake libtool flex bison gcc-c++ openssl-devel make cmake doxygen.x86_64 glib-devel python-devel bzip2-devel svn libevent-devel krb5-workstation openldap-devel db4-devel python-setuptools python-pip cyrus-sasl* postgresql postgresql-server ant-nodeps lzo-devel lzop |
...
sudo pip-python install allpairs pytest pytest-xdist paramiko texttable prettytable sqlparse psutil==0.7.1 pywebhdfs gitpython jenkinsapi |
...
boto3 |
Configuring Postgresql
If you are installing Impala on a fresh machine, you'll need to initialize postgres. On CentOs 6.5 this can be done by running
Code Block |
---|
sudo service postgresql initdb |
You need to make a configurations change to allow Hbase and the Hive metastore to functions correctly. Edit the following file as root.
On Ubuntu 14.04
/etc/postgresql/*/main/pg_hba.conf
On CentOs 6.5
/var/lib/pgsql/data/pg_hba.conf
In the following lines at the end of the file, change `peer` or `ident` to `trust`.
Code Block |
---|
# Database administrative login by UNIX |
...
sockets local all all ident # TYPE DATABASE USER CIDR-ADDRESS METHOD # "local" is for Unix domain socket connections only local all all ident # IPv4 local connections: host all all 127.0.0.1/ |
...
32 md5 # IPv6 local connections: host all all ::1/128 md5 |
To make Postgres aware of these changes, either restart the service or run: pg_ctl reload
#####
Creating the Hive metastore user
Code Block |
---|
sudo -u postgres psql postgres |
Then, at the `postgres` command prompt:
Code Block |
---|
...
CREATE ROLE hiveuser LOGIN PASSWORD 'password'; |
...
ALTER ROLE hiveuser WITH CREATEDB; |
...
Maven 3
On some older systems you may need to install Maven 3 from https://maven.apache.org/ and install it:
Code Block |
---|
tar xvf apache-maven-3.0.5-bin.tar.gz && sudo mv apache-maven-3.0.5 /usr/local |
...
Environment variables
Put these in your `.bashrc` or elsewhere:
On Ubuntu 14.04/CentOS6.5
Code Block |
---|
...
export JAVA_HOME=/usr/lib/jvm/java-7-oracle |
...
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu |
...
export LC_ALL="en_US.UTF-8" |
...
export M2_HOME=/usr/local/apache-maven-3.0.5 |
...
export M2=$M2_HOME/bin |
...
export PATH=$M2:$PATH |
...
...
Add a path for HDFS domain sockets
Code Block |
---|
...
sudo mkdir /var/lib/hadoop-hdfs/ |
...
sudo chown <user> /var/lib/hadoop-hdfs/ |
...
...
Start local ssh server
Code Block |
---|
...
sudo service ssh start |
...
Enable password-less SSH for HBase
Code Block |
---|
...
ssh-keygen -t dsa |
...
# Do not type in any passkey. Just press enter. |
...
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys |
...
...
Setup NTP for Kudu
On CentOS 7
Code Block |
---|
...
yum install ntp |
...
systemctl start ntpd |
...