Useful Tips for New Impala Developers

This page contains useful miscellaneous information that may make your life easier while developing Impala.

Capturing Crash Dumps

To automatically save crash dump files you must first set the max size for core dumps as unlimited. When a crash occurs, the core dump will will be placed in your current directory and will be named 'core'. You can then open this file and inspect it with gdb. For example:

ulimit -c unlimited

be/build/debug/service/runquery -query="some query that causes crash"
gdb be/build/debug/service/runquery ./core

  # view call stack
  > (gdb) backtrace

Logging

We use the Google Logging Library. See http://google-glog.googlecode.com/svn/trunk/doc/glog.html

The library defines logging levels of ERROR, INFO and WARNING. We also use verbose logging which can be turned on with with environment variables. e.g:

export GLOG_v=1

Or for specific modules:

export GLOG_vmodule="hdfs*=2,run=1"

You can enable logging in general and turn if off for specific modules that may be too chatty at that level

export GLOG_v=3
export GLOG_vmodule="state-store=0"

We have defined the levels 1,2 and 3 to represent information that is displayed per query/connection, file or row:

#define VLOG_CONNECTION VLOG(1)
#define VLOG_QUERY      VLOG(1)
#define VLOG_FILE       VLOG(2)
#define VLOG_ROW        VLOG(3)

Logging generally goes to files in /logs. The naming is:

/logs/cluster/<program_name>.<host_name>.log.[ERROR|INFO|WARNING].<timestamp>.<pid>

The last instance of a particular program log will be linked to:

/logs/cluster/<program_name>.[ERROR|INFO|WARNING]

VLOG output goes into the INFO file. You can send it to standard error by setting the environment:

export GLOG_logtostderr=1

Existing Hadoop Installations

If you can avoid it, don't install or run another Hadoop on your system. This is an easy source of problems when developing as out-of-date binaries, headers and configuration files can get silently picked up.

Useful Bookmarklets for Your Browser

Mirror upstream JIRA to downstream

We track all upstream P1 and P2 issues in our downstream Jira. Replication happens automatically, but can also be done by running the jira-mirror Jenkins job. To trigger this job you can use this bookmarklet on any public JIRA page:

 javascript:location.href='http://golden.jenkins.cloudera.com/view/Impala/job/jira-mirror/parambuild/?UPSTREAM_ISSUE='+document.location.href;

Navigate from upstream JIRA to downstream

Often you find yourself navigating between upstream and downstream JIRA pages related to the same issue. All downstream JIRAs should have a link to the upstream JIRA. To find the corresponding downstream JIRA from an upstream page, you can use this bookmarklet:

javascript:location.href='https://jira.cloudera.com/issues/?jql=text%20~%20%22'+document.location["pathname"].split('/')[2]+'%22';

It works on pages like https://issues.cloudera.org/browse/IMPALA-3641.

Navigate to latest version of docs

Google search will often send you to documentation of older releases. To navigate to the latest version of a documentation page, you can use this bookmarklet:

javascript:location.href='http://www.cloudera.com/documentation/enterprise/latest/topics/' + document.location["pathname"].substring(document.location["pathname"].lastIndexOf('/') + 1);

It works on pages like https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_parquet.html.

Space shortcuts

Page tree