Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


If you are running Ubuntu 14.04, you can bootstrap a development environment using the script The main development environment for Impala uses Ubuntu 16.04 with the bash shell. The bin/bootstrap_development.sh script initializes a fresh Impala development environment for Ubuntu 16.04. It will alter your environment, including ~/.ssh/config and /etc/hosts, so consider running it in a VM or container.

It takes 6-7 hours in total to load all of the testdata and run all of the tests. See the comments in the file for more information.

If you are running Ubuntu 16.04, you can try this:

#!/bin/bash
 
# bootstrap a development environment in Impala on Ubuntu 16.04. Takes 3-5 hours.
 
# tmux and mosh: keep the tests running if you get disconnected
# emacs: for any changes you need to make
# ccache and ninja: for rebuilding, but see http://gerrit.cloudera.org:8080/6942
sudo apt-get --yes install tmux mosh emacs-nox ccache ninja-build
# TODO: config ccache

# TODO: check that there is enough space on disk to do a data load
 
# Some things I use in my tmux setup.
cat >~/.tmux.conf <<EOF
set-window-option -g xterm-keys on
unbind-key -n C-left
unbind-key -n C-right
bind -n M-up new-window
bind -n M-right next-window
bind -n M-left previous-window
bind-key -n C-S-Left swap-window -t -1
bind-key -n C-S-Right swap-window -t +1
EOF

# Stop here and run the rest in tmux.
exit 1

git clone http://gerrit.cloudera.org:8080/Impala-ASF Impala
cd Impala

# Install oracle Java 7. Untested: openjdk 7. Oracle Java 8 fails, IMPALA-5344
sudo add-apt-repository --yes ppa:webupd8team/java
sudo apt-get update
# Allow scripted installation; this agrees to a EULA. Or not, I don't know; I'm a script
# not an attorney.
echo "oracle-java7-installer shared/accepted-oracle-license-v1-1 select true" | sudo debconf-set-selections
sudo apt-get --yes install oracle-java7-installer
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
echo 'export JAVA_HOME=/usr/lib/jvm/java-7-oracle' >> ~/.bashrc

# Some other requirements from bootstrap_build.sh
sudo apt-get --yes install g++ gcc git libsasl2-dev libssl-dev make maven python-dev python-setuptools

# IMPALA-3932, IMPALA-3926
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH
echo 'export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH' >> ~/.bashrc

# Set up postgress for HMS
sudo apt-get --yes install postgresql
sudo -u postgres psql -c "CREATE ROLE hiveuser LOGIN PASSWORD 'password';" postgres
sudo -u postgres psql -c "ALTER ROLE hiveuser WITH CREATEDB;" postgres
# TODO: What are the security implications of this?
sudo sed -i 's/local   all             all                                     peer/local   all             all                                     trust/g' /etc/postgresql/9.5/main/pg_hba.conf
sudo service postgresql restart
sudo /etc/init.d/postgresql reload
sudo service postgresql restart

# Setup ssh to ssh to localhost
ssh-keygen -t rsa -N '' -q -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh-keyscan -H github.com >> ~/.ssh/known_hosts
echo "NoHostAuthenticationForLocalhost yes" >> ~/.ssh/config

# Workarounds for HDFS networking issues
echo "127.0.0.1 $(hostname -s) $(hostname)" | sudo tee -a /etc/hosts
sudo sed -i 's/127.0.1.1/127.0.0.1/g' /etc/hosts
 
sudo mkdir /var/lib/hadoop-hdfs
sudo chown $(whoami) /var/lib/hadoop-hdfs/

echo "*               hard    nofile          1048576" | sudo tee -a /etc/security/limits.conf
echo "*               soft    nofile          1048576" | sudo tee -a /etc/security/limits.conf

export IMPALA_HOME="$(pwd)"
 
# LZO is not needed to compile or run Impala, but it is needed for the data load
sudo apt-get --yes install liblzo2-dev
cd ~
git clone https://github.com/cloudera/impala-lzo.git
ln -s impala-lzo Impala-lzo
git clone https://github.com/cloudera/hadoop-lzo.git
cd hadoop-lzo/
time -p ant package
cd "$IMPALA_HOME"

export MAX_PYTEST_FAILURES=0
source bin/impala-config.sh
export NUM_CONCURRENT_TESTS=$(nproc)
(time -p ./buildall.sh -noclean -format -testdata -build_shared_libs ; echo $?) &>> test-result.txt&
tail -F test-result.txt

 

For instructions specific to Docker, see Impala Development Environment inside Docker.

Machine recommendations:

  • Impala requires 120GB of available disk space for a fully functional environment. An SSD is strongly recommended.
  • Impala compilation is CPU intensive. At least 4 CPUs are recommended. More CPUs will speed compilation.
  • Some Impala tests are memory intensive. 32GB of memory is recommended to be able to run all tests locally.

Quick start commands:

# Pick a location to use for your Impala environment
export IMPALA_HOME=your/desired/directory

# Get the source
git clone https://gitbox.apache.org/repos/asf/impala.git ${IMPALA_HOME}

# Run the bootstrap script
${IMPALA_HOME}/bin/bootstrap_development.sh

Note:

The source, a full build, and creating and importing all the test data requires approximately 120G of available space.  If you see an error in the console while running bootstrap_development.sh similar to "FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)", and if you see an warning in hdfs-namenode.log similar to "org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3", this may be an indication that you don't have enough available space. This can happen even if system utilities show free disk space.

If you encounter this error when rebuilding an existing cluster, clean up accumulated files in: ${IMPALA_HOME}/logs, ${IMPALA_HOME}/be/build. Remove older versions of cdh_components-xxxxxx in ${IMPALA_HOME}/toolchain. Use df to check disk space.

...