Environment setup scripts

bin/bootstrap_system.sh

  • This script bootstraps a system for Impala development from almost nothing. It installs all the dependencies needed in Impala compilation and all the tests.
  • Note that if you just want to compile Impala without running any tests, you just need to pick up some commands of it. E.g. You don't need to install postgresql since it's only used in Hive and Ranger for testing.
  • Note: Read it before running it. It doesn't accept any options (no -h or --help).
  • Supported OS
    • Ubuntu 16.04, 18.04, 20.04

    • Redhat/CentOS 6,7,8
  • Usage: run it directly

bin/impala-config.sh

  • Source this file from the $IMPALA_HOME directory to setup your environment. If $IMPALA_HOME is undefined this script will set it to the current working directory.
  • It references two other scrips: bin/impala-config-branch.sh and bin/impala-config-local.sh. You can add customized variables there without modifying impala-config.sh. Read the script header for more details.
  • Usages:

    source bin/impala-config.sh
    . bin/impala-config.sh

Build scripts

buildall.sh

  • Driver script to build everything you need. See more in Building Impala and Tips for Faster Impala Builds.
  • Examples

    # Compile Impala in RELEASE mode without building any tests or loading test data.
    # Remove the -ninja option if you use make instead of ninja.
    ./buildall.sh -noclean -notests -release -ninja

Cluster management scripts

testdata/bin/run-all.sh, testdata/bin/kill-all.sh

  • Launch/Stop the minicluster for testing. It includes HDFS, YARN, Kudu, HBase, Hive, Ranger, etc.

testdata/bin/run-mini-dfs.sh testdata/bin/kill-mini-dfs.sh

  • Launch/Stop HDFS, YARN, Kudu, KMS in the minicluster.

testdata/bin/run-hbase.sh testdata/bin/kill-hbase.sh

  • Launch/Stop HBase in the minicluster. It will also launch/stop Zookeeper.

testdata/bin/run-hive-server.sh  testdata/bin/kill-hive-server.sh

  • Launch/Stop Hive (HMS and HiveServer2) in the minicluster.

testdata/bin/run-ranger-server.sh testdata/bin/kill-ranger-server.sh

  • Launch/Stop Ranger Admin server in the minicluster.

bin/start-impala-cluster.py

  • Control script for the impala cluster.
  • See all usages using -h option.
  • Examples

    # Restarts only the impalad processes. Catalogd keeps alive so doesn't need to reload metadata.
    bin/start-impala-cluster.py -r
    
    # Launch the Impala cluster with only one impalad.
    bin/start-impala-cluster.py -s 1
    
    # Stop the Impala cluster
    bin/start-impala-cluster.py --kill
    
    # Launch the Impala cluster using release build type. Default is latest.
    bin/start-impala-cluster.py --build_type release
    
    # Launch the Impala cluster with customized flags.
    bin/start-impala-cluster.py --impalad_args="--use_local_catalog=true" --catalogd_args="--catalog_topic_mode=minimal"

Useful One-Liners

# Launch services that only needed in simple e2e tests, e.g. HDFS, HMS, Impala
testdata/bin/run-mini-dfs.sh && testdata/bin/run-hive-server.sh -only_metastore && bin/start-impala-cluster.py


Test scripts

bin/impala-py.test

Util scripts

bin/impala-shell.sh

  • Launch the impala-shell. Use -h options to see all usages.

bin/load-data.py

  • This script is used to load the proper datasets for the specified workloads. It loads all data via Hive except for parquet data which needs to be loaded via Impala. Most ddl commands are executed by Impala.
  • Use -h options to see all usages. See more in Impala Test Data.
  • Examples

    # Load data of a specific workload (functional-query) with a specifit stategy (core)
    bin/load-data.py -e core -w functional-query
    
    # Load data only in the specific formats.
    bin/load-data.py -e core -w functional-query --table_formats=text/none,parquet/none,orc/def/block
    
    # Force reload (-f) data of specific tables in the specific formats
    bin/load-data.py -e core -w tpch -f --table_formats=parquet/none --table_names=lineitem
    
    # Load TPCDS text,parquet and orc tables in scale factor 30.
    bin/load-data.py -e core -w tpcds --scale_factor=30 --table_formats=text/none,parquet/none,orc/def/block

testdata/bin/compute-table-stats.sh

  • Runs compute table stats over a curated set of Impala test tables.



  • No labels