This page documents how to do Impala development inside a Docker container. This allows you to isolate your development environment from the rest of your system. If you want to build a containerized version of Impala suitable for production deployment with one daemon process per container, see Build and Test for Daemon Docker Containers.


If you don't have an Ubuntu 14.04 or 16.04 environment available, you can use Docker to develop. First, install Docker as you normally would. Make sure the resource limit of your Docker Engine is at least 4 CPU cores and 8GB RAM (the more the better). For example, for docker on Mac. Go to Preferences → Advanced:

Then back to your terminal,

docker pull ubuntu:16.04
# SYS_TIME is required for kudu to work. The container will be able to change the time of the host.
# -p options expose the container's ports to the host. You can add more in need.
# If you need to share files between the container and the host, add another -v option, e.g. "-v ~/Downloads/:/HostShared"
docker run --cap-add SYS_TIME --interactive --tty --name impala-dev -p 25000:25000 -p 25010:25010 -p 25020:25020 ubuntu:16.04 bash


Now, within the container:

apt-get update
apt-get install sudo
adduser --disabled-password --gecos '' impdev
echo 'impdev ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
su - impdev


Then, as impdev in the container:

sudo apt-get --yes install git
git clone https://git-wip-us.apache.org/repos/asf/impala.git ~/Impala
cd ~/Impala
export IMPALA_HOME=`pwd`
# See https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala for developing Impala.
$IMPALA_HOME/bin/bootstrap_development.sh

or

# See https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala for testing Impala.
$IMPALA_HOME/bin/bootstrap_system.sh
source $IMPALA_HOME/bin/impala-config.sh
$IMPALA_HOME/buildall.sh -noclean -notests
$IMPALA_HOME/bin/create-test-configuration.sh -create_metastore -create_sentry_policy_db
$IMPALA_HOME/testdata/bin/run-all.sh
$IMPALA_HOME/bin/start-impala-cluster.py


When that's done, start developing! When you're ready to pause, in a new terminal in the host:

docker commit impala-dev && docker stop impala-dev


When you're ready to get back to work:

docker start --interactive impala-dev


If instead of committing your work and stopping the container, you just want to detach from it, use ctrl-p ctrl-q. You can re-attach using the start command.

Each time you restart the container, remember to run $IMPALA_HOME/bin/bootstrap_system.sh to launch all the depended services.

Troubleshooting

1. MAKE processes are killed by errors like "collect2: error: ld terminated with signal 9 [Killed]" or failed by "No space left on device" error.

You need to allocate more disk/RAM space to your docker container. If you don’t have so large RAM in your host machine, try lower the concurrency for make by giving IMPALA_BUILD_THREADS a smaller number (defaults to #CPUs).


2. Build fails in the following error:

Creating postgresql database for Hive metastore
dropdb: could not connect to database template1: could not connect to server: Connection refused
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
createdb: could not connect to database template1: could not connect to server: Connection refused
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
ERROR in /home/impdev/Impala/bin/create-test-configuration.sh at line 149: createdb -U hiveuser ${METASTORE_DB}

This usually happens when you restart you docker container. You just need to start postgresql manually. Find how to start postgresql in $IMPALA_HOME/bin/bootstrap_system.sh. For example, in Ubuntu:

sudo service postgresql start


3. Cannot start HBase in the minicluster:

localhost: ssh: connect to host localhost port 22: Cannot assign requested address
running master, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-master-4846cae1a5dd.out
: running regionserver, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-regionserver-4846cae1a5dd.out
running regionserver, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-2-regionserver-4846cae1a5dd.out
running regionserver, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-3-regionserver-4846cae1a5dd.out
Contents of HDFS root: []
Connecting to Zookeeper host(s).
No handlers could be found for logger "kazoo.client"
Could not connect to Zookeeper: Connection time-out
ERROR in /home/impdev/Impala/testdata/bin/run-hbase.sh at line 136: ${CLUSTER_BIN}/check-hbase-nodes.py
Generated: /home/impdev/Impala/logs/extra_junit_xml_logs/generate_junitxml.buildall.run-hbase.20190705_00_25_47.xml
ERROR in testdata/bin/run-all.sh at line 64: tee ${IMPALA_CLUSTER_LOGS_DIR}/run-hbase.log
Generated: /home/impdev/Impala/logs/extra_junit_xml_logs/generate_junitxml.buildall.run-all.20190705_00_25_47.xml


The first line of the errors shows the root cause. Check whether you can ssh to localhost by "ssh localhost whoami". Make sure your sshd service is started, and the setting of non password login is correct. Check them out in $IMPALA_HOME/bin/bootstrap_system.sh. For example, in Ubuntu:

sudo service ssh start

Verify these things in $IMPALA_HOME/bin/bootstrap_system.sh take effect:

mkdir -p ~/.ssh
chmod go-rwx ~/.ssh
if ! [[ -f ~/.ssh/id_rsa ]]
then
  ssh-keygen -t rsa -N '' -q -f ~/.ssh/id_rsa
fi

{ echo "" | cat - ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys; } && chmod 0600 ~/.ssh/authorized_keys
echo -e "\nNoHostAuthenticationForLocalhost yes" >> ~/.ssh/config && chmod 0600 ~/.ssh/config

Usually, these errors can be avoided if you run $IMPALA_HOME/bin/bootstrap_system.sh after restarting the container.

Developing Impala with Dev Container

Refer to this doc.

 

  • No labels