Useful Tips for New Impala Developers

This page contains useful miscellaneous information that may make your life easier while developing Impala.

Capturing Crash Dumps

To automatically save crash dump files you must first set the max size for core dumps as unlimited. When a crash occurs, the core dump will will be placed in your current directory and will be named 'core'. You can then open this file and inspect it with gdb. For example:

ulimit -c unlimited

be/build/debug/service/runquery -query="some query that causes crash"
gdb be/build/debug/service/runquery ./core

  # view call stack
  > (gdb) backtrace

Logging

We use the Google Logging Library. See https://github.com/google/glog/blob/master/README.rst

The library defines logging levels of ERROR, INFO and WARNING. We also use verbose logging which can be turned on with with environment variables. e.g:

export GLOG_v=1

Or for specific modules:

export GLOG_vmodule="hdfs*=2,run=1"

You can enable logging in general and turn if off for specific modules that may be too chatty at that level

export GLOG_v=3
export GLOG_vmodule="state-store=0"

We have defined the levels 1,2 and 3 to represent information that is displayed per query/connection, file or row:

#define VLOG_CONNECTION VLOG(1)
#define VLOG_QUERY      VLOG(1)
#define VLOG_FILE       VLOG(2)
#define VLOG_ROW        VLOG(3)

Logging generally goes to files in /logs. The naming is:

/logs/cluster/<program_name>.<host_name>.log.[ERROR|INFO|WARNING].<timestamp>.<pid>

The last instance of a particular program log will be linked to:

/logs/cluster/<program_name>.[ERROR|INFO|WARNING]

VLOG output goes into the INFO file. You can send it to standard error by setting the environment:

export GLOG_logtostderr=1

Call Trace

Sometimes you may want to know how the code path comes into a function. In the backend, you can add some logs for GetStackTrace():

VLOG_QUERY << "args: " << your_interested_var << std::endl << GetStackTrace();

Make sure "util/debug-util.h" is included in your file.

In the frontend, you can add some logs for a generated Exception:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class YourInterestedClass {
  private final static Logger LOG = LoggerFactory.getLogger(YourInterestedClass.class);
  public void yourInterestedFunc() {
    ...
    LOG.info("some message", new Exception("call trace"));
    ...
  }

Existing Hadoop Installations

If you can avoid it, don't install or run another Hadoop on your system. This is an easy source of problems when developing as out-of-date binaries, headers and configuration files can get silently picked up.

Developer Tooling

Impala developers use the following tooling to work on the code base:

vim
- https://github.com/Valloric/YouCompleteMe. Smart completion, including understanding C++ code and highlighting errors.
  - Install Vundle by following instructions at https://github.com/VundleVim/Vundle.vim
  - Install the YouCompleteMe Vim plugin with the following steps.
    - Add plugin to .vimrc in the Vundle section:
```
Plugin 'Valloric/YouCompleteMe' 
```
    - Run
```
vim +PluginInstall +qall  # from the command line
```
    - Finish compiling YCM with clang support:
```
cd ~/.vim/bundle/YouCompleteMe
./install.sh --clang-completer
```
- eclim. Eclipse integration
- clang_complete (an alternative to YouCompleteMe)
Eclipse
- See Eclipse Setup for Impala Development
NetBeans

Starting Minicluster with SSL

To start the minicluster with SSL you need a SSL certificate/key pair. It can be self-signed:

# Make sure you specify your Common Name as your host's FQDN
openssl req -newkey rsa:2048 -nodes -keyout key.pem -x509 -days 365 -out certificate.pem
 
# After building, you can start your Impala cluster with the same flags as documented in
# http://impala.apache.org/docs/build/html/topics/impala_ssl.html
# Note that we are setting the --catalog_service_host and --state_store_host to avoid them defaulting to localhost.
# SSL won't tolerate mismatch Common Name
 
$IMPALA_HOME/bin/start-impala-cluster.py --impalad_args='--backend_client_rpc_timeout_ms=10000 --catalog_service_host=$(hostname -f) --state_store_host=$(hostname -f) --ssl_server_certificate=$IMPALA_HOME/certificate.pem --ssl_private_key=$IMPALA_HOME/key.pem --ssl_client_ca_certificate=$IMPALA_HOME/certificate.pem' --catalogd_args='--catalog_service_host=$(hostname -f) --state_store_host=$(hostname -f) --ssl_server_certificate=$IMPALA_HOME/certificate.pem --ssl_private_key=$IMPALA_HOME/key.pem --ssl_client_ca_certificate=$IMPALA_HOME/certificate.pem' --state_store_args='--catalog_service_host=$(hostname -f) --state_store_host=$(hostname -f) --ssl_server_certificate=$IMPALA_HOME/certificate.pem --ssl_private_key=$IMPALA_HOME/key.pem --ssl_client_ca_certificate=$IMPALA_HOME/certificate.pem'

Running Automated Code Quality Checks Locally

When a patchset is published in Gerrit, automated code checks are ran. To run these checks on local code before pushing, follow these steps.

Python

From the Impala home directory, run:

./bin/jenkins/critique-gerrit-review.py --dryrun

C++

From the Impala home directory, run clang tidy (note: this runs a full build and thus takes a few minutes):

./bin/run_clang_tidy.sh

Space shortcuts

Page tree