Impala currently depends on the ORC C++ library to read ORC files. This document introduces how to compile Impala with a customizd ORC branch. So at least you can test on the integration with the latest ORC library.
Compile ORC library
Checkout to your customized ORC branch and compile it using the same compiler of Impala toolchain. Here we use master branch as an example:
git clone https://github.com/apache/orc.git cd orc mkdir build && cd build # Export CC and CXX to let cmake use Impala's gcc export CC="${IMPALA_HOME}/toolchain/gcc-${IMPALA_GCC_VERSION}/bin/gcc" export CXX="${IMPALA_HOME}/toolchain/gcc-${IMPALA_GCC_VERSION}/bin/g++" # Use Impala's cmake. Don't build the java lib and libhdfspp. ${IMPALA_HOME}/toolchain/cmake-${IMPALA_CMAKE_VERSION}/bin/cmake .. -DBUILD_JAVA=OFF -DBUILD_LIBHDFSPP=OFF -DINSTALL_VENDORED_LIBS=OFF # Then compile with multi-processes. $(nproc) is the number of virtual CPU cores. make -j $(nproc) # If succeeds, you should be able to find at c++/src/liborc.a
Link Impala with your customized ORC library
Manually replace the ORC library in Impala's toolchain dir with your customized one. Then recompile Impala. Let's say ${ORC_HOME} is where you clone the ORC repo.
cd $IMPALA_HOME/toolchain # Backup the existing library cp -r orc-${IMPALA_ORC_VERSION} orc-${IMPALA_ORC_VERSION}-bak # Replace the library cp ${ORC_HOME}/build/c++/src/liborc.a orc-${IMPALA_ORC_VERSION}/lib/liborc.a # Replace the header files rm orc-${IMPALA_ORC_VERSION}/include/orc/* cp ${ORC_HOME}/build/c++/include/orc/orc-config.hh orc-${IMPALA_ORC_VERSION}/include/orc/ cp ${ORC_HOME}/c++/include/orc/*.hh orc-${IMPALA_ORC_VERSION}/include/orc/ # Recompile Impala make -j $(nproc) impalad