This page documents how to do Impala development inside a Docker container. This allows you to isolate your development environment from the rest of your system. If you want to build a containerized version of Impala suitable for production deployment with one daemon process per container, see Build and Test for Daemon Docker Containers.
If you don't have an Ubuntu 14.04 or 16.04 environment available, you can use Docker to develop. First, install Docker as you normally would. Make sure the resource limit of your Docker Engine is at least 4 CPU cores and 8GB RAM (the more the better). For example, for docker on Mac. Go to Preferences → Advanced:
Then back to your terminal,
Code Block | ||
---|---|---|
| ||
docker pull ubuntu:16.04 # SYS_TIME is required for kudu to work. The container will be able to change the time of the host. # -p options expose the container's ports to the host. You can add more in need. # If you need to share files between the container and the host, add another -v option, e.g. "-v ~/Downloads/:/HostShared" docker run --cap-privilegedadd SYS_TIME --interactive --tty --name impala-dev -p 25000:25000 -p 25010:25010 -p 25020:25020 ubuntu:16.04 bash |
Now, within the container:
...
Code Block | ||
---|---|---|
| ||
sudo apt-get --yes install git git clone https://git-wip-us.apache.org/repos/asf/incubator-impala.git ~/Impala source ~/Impalacd ~/Impala export IMPALA_HOME=`pwd` |
Code Block | ||
---|---|---|
| ||
# See https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala for developing Impala. $IMPALA_HOME/bin/bootstrap_development.sh |
or
Code Block | ||
---|---|---|
| ||
# See https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala for testing Impala. $IMPALA_HOME/bin/bootstrap_system.sh source $IMPALA_HOME/bin/impala-config.sh $IMPALA_HOME/buildall.sh -noclean -notests $IMPALA_HOME/bin/create-test-configuration.sh -create_metastore -create_sentry_policy_db $IMPALA_HOME/testdata/bin/run-all.sh $IMPALA_HOME/bin/start-impala-cluster.py |
When that's done, start developing! When you're ready to pause, in a new terminal in the host:
...
If instead of committing your work and stopping the container, you just want to detach from it, use ctrl-p ctrl-q. You can re-attach using the start command.
Each time you restart the container, remember to run $IMPALA_HOME/bin/bootstrap_system.sh to launch all the depended services.
Troubleshooting
1. MAKE processes are killed by errors like "collect2: error: ld terminated with signal 9 [Killed]" or failed by "No space left on device" error.
You need to allocate more disk/RAM space to your docker container. If you don’t have so large RAM in your host machine, try lower the concurrency for make by giving IMPALA_BUILD_THREADS a smaller number (defaults to #CPUs).
2. Build fails in the following error:
Code Block |
---|
Creating postgresql database for Hive metastore
dropdb: could not connect to database template1: could not connect to server: Connection refused
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
createdb: could not connect to database template1: could not connect to server: Connection refused
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
ERROR in /home/impdev/Impala/bin/create-test-configuration.sh at line 149: createdb -U hiveuser ${METASTORE_DB} |
This usually happens when you restart you docker container. You just need to start postgresql manually. Find how to start postgresql in $IMPALA_HOME/bin/bootstrap_system.sh. For example, in Ubuntu:
Code Block |
---|
sudo service postgresql start |
3. Cannot start HBase in the minicluster:
Code Block |
---|
localhost: ssh: connect to host localhost port 22: Cannot assign requested address
running master, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-master-4846cae1a5dd.out
: running regionserver, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-regionserver-4846cae1a5dd.out
running regionserver, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-2-regionserver-4846cae1a5dd.out
running regionserver, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-3-regionserver-4846cae1a5dd.out
Contents of HDFS root: []
Connecting to Zookeeper host(s).
No handlers could be found for logger "kazoo.client"
Could not connect to Zookeeper: Connection time-out
ERROR in /home/impdev/Impala/testdata/bin/run-hbase.sh at line 136: ${CLUSTER_BIN}/check-hbase-nodes.py
Generated: /home/impdev/Impala/logs/extra_junit_xml_logs/generate_junitxml.buildall.run-hbase.20190705_00_25_47.xml
ERROR in testdata/bin/run-all.sh at line 64: tee ${IMPALA_CLUSTER_LOGS_DIR}/run-hbase.log
Generated: /home/impdev/Impala/logs/extra_junit_xml_logs/generate_junitxml.buildall.run-all.20190705_00_25_47.xml |
The first line of the errors shows the root cause. Check whether you can ssh to localhost by "ssh localhost whoami". Make sure your sshd service is started, and the setting of non password login is correct. Check them out in $IMPALA_HOME/bin/bootstrap_system.sh. For example, in Ubuntu:
Code Block |
---|
sudo service ssh start |
Verify these things in $IMPALA_HOME/bin/bootstrap_system.sh take effect:
Code Block |
---|
mkdir -p ~/.ssh
chmod go-rwx ~/.ssh
if ! [[ -f ~/.ssh/id_rsa ]]
then
ssh-keygen -t rsa -N '' -q -f ~/.ssh/id_rsa
fi
{ echo "" | cat - ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys; } && chmod 0600 ~/.ssh/authorized_keys
echo -e "\nNoHostAuthenticationForLocalhost yes" >> ~/.ssh/config && chmod 0600 ~/.ssh/config |
Usually, these errors can be avoided if you run $IMPALA_HOME/bin/bootstrap_system.sh after restarting the container.
Developing Impala with Dev Container
Refer to this doc.