You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Impala currently depends on the ORC C++ library to read ORC files. This document introduces how to compile Impala with a customizd ORC branch. So at least you can test on the integration with the latest ORC library.

Compile ORC library

Checkout to your customized ORC branch and compile it using the same compiler of Impala toolchain. Here we use master branch as an example:

git clone https://github.com/apache/orc.git
cd orc
mkdir build && cd build

# Export CC and CXX to let cmake use Impala's gcc
export CC="${IMPALA_HOME}/toolchain/gcc-${IMPALA_GCC_VERSION}/bin/gcc"
export CXX="${IMPALA_HOME}/toolchain/gcc-${IMPALA_GCC_VERSION}/bin/g++"

# Use Impala's cmake. Don't build the java lib and libhdfspp.
${IMPALA_HOME}/toolchain/cmake-${IMPALA_CMAKE_VERSION}/bin/cmake .. -DBUILD_JAVA=OFF -DBUILD_LIBHDFSPP=OFF -DINSTALL_VENDORED_LIBS=OFF
# Then compile with multi-processes. $(nproc) is the number of virtual CPU cores.
make -j $(nproc)

# If succeeds, you should be able to find at c++/src/liborc.a

Link Impala with your customized ORC library

Manually replace the ORC library in Impala's toolchain dir with your customized one. Then recompile Impala. Let's say ${ORC_HOME} is where you clone the ORC repo.

cd $IMPALA_HOME/toolchain
# Backup the existing library
cp -r orc-${IMPALA_ORC_VERSION} orc-${IMPALA_ORC_VERSION}-bak
# Replace the library
cp ${ORC_HOME}/build/c++/src/liborc.a orc-${IMPALA_ORC_VERSION}/lib/liborc.a
# Replace the header files
rm orc-${IMPALA_ORC_VERSION}/include/orc/*
cp ${ORC_HOME}/build/c++/include/orc/orc-config.hh orc-${IMPALA_ORC_VERSION}/include/orc/
cp ${ORC_HOME}/c++/include/orc/*.hh orc-${IMPALA_ORC_VERSION}/include/orc/

# Recompile Impala
make -j $(nproc) impalad


  • No labels