How to Build
Setup an environment with the dependencies installed
Make sure you have done:
xcode-select --install
to install developer tools
Install dependencies on MAC (with xcode installed)
brew install protobuf protobuf-c Gsasl openssl boost thrift json-c ccache snappy libyaml libevent python maven brew tap brona/iproute2mac brew install iproute2mac brew install postgresql brew install cmake brew install lcov sudo pip install pygresql==4.0 (make sure specified version 4.0) sudo pip install unittest2 pycrypto lockfile paramiko psi pyyaml sudo pip install http://sourceforge.net/projects/pychecker/files/pychecker/0.8.19/pychecker-0.8.19.tar.gz/download sudo pip install http://darcs.idyll.org/~t/projects/figleaf-0.6.1.tar.gz brew uninstall postgresql
Note for Installing Dependencies
- some dependencies require brew install <packagename> --universal, if you see HAWQ complains about an already installed package is required, try that.
- El Capitan issues: boost cannot be installed as --universal (command shell will hang), you need follow manul steps http://www.boost.org/doc/libs/1_61_0/more/getting_started/unix-variants.html#easy-build-and-install
OS requirement
- Use a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value.
kern.sysv.shmmax=2147483648
kern.sysv.shmmin=1
kern.sysv.shmmni=64
kern.sysv.shmseg=16
kern.sysv.shmall=524288
kern.maxfiles=65535
kern.maxfilesperproc=65536
- Reboot to apply the change.
Install Xcode
Xcode includes the tools, compiler and SDK for building HAWQ.
Install dependencies on CentOS 7.X
Dependencies
(CentOS7 user can follow easy steps provided by Zhanwei Wang)
curl -L "https://bintray.com/wangzw/rpm/rpm" -o /etc/yum.repos.d/bintray-wangzw-rpm.repo yum install -y epel-release yum makecache yum install -y man passwd sudo tar which git mlocate links make bzip2 net-tools \ autoconf automake libtool m4 gcc gcc-c++ gdb bison flex gperf maven indent \ libuuid-devel krb5-devel libgsasl-devel expat-devel libxml2-devel \ perl-ExtUtils-Embed pam-devel python-devel libcurl-devel snappy-devel \ thrift-devel libyaml-devel libevent-devel bzip2-devel openssl-devel \ openldap-devel protobuf-devel readline-devel net-snmp-devel apr-devel \ libesmtp-devel xerces-c-devel python-pip json-c-devel libhdfs3-devel \ apache-ivy java-1.7.0-openjdk-devel \ openssh-clients openssh-server yum install -y postgresql-devel pip --retries=50 --timeout=300 install pg8000 simplejson unittest2 pycrypto pygresql pyyaml lockfile paramiko psi pip --retries=50 --timeout=300 install http://darcs.idyll.org/~t/projects/figleaf-0.6.1.tar.gz pip --retries=50 --timeout=300 install http://sourceforge.net/projects/pychecker/files/pychecker/0.8.19/pychecker-0.8.19.tar.gz/download yum erase -y postgresql postgresql-libs postgresql-devel yum install lcov You need to install cmake >=3.0 also. (Download from https://cmake.org/files/)
OS requirement
use a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value.
kernel.shmmax = 1000000000 kernel.shmmni = 4096 kernel.shmall = 4000000000 kernel.sem = 250 512000 100 2048 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.msgmni = 2048 net.ipv4.tcp_syncookies = 0 net.ipv4.ip_forward = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_max_syn_backlog = 200000 net.ipv4.conf.all.arp_filter = 1 net.ipv4.ip_local_port_range = 1281 65535 net.core.netdev_max_backlog = 200000 vm.overcommit_memory = 2 fs.nr_open = 3000000 kernel.threads-max = 798720 kernel.pid_max = 798720 # increase network net.core.rmem_max=2097152 net.core.wmem_max=2097152
- Execute the following command to apply your updated /etc/sysctl.conf file to the operating system configuration:
sysctl -p
- Use a text editor to edit the /etc/security/limits.conf file. Add the following definitions in the exact order that they are listed
* soft nofile 2900000 * hard nofile 2900000 * soft nproc 131072 * hard nproc 131072
Build dependencies yourself ( tested on Redhat 6.X).
Dependencies
There are several dependencies (see the following table) you must install before building HAWQ. To build Apache HAWQ, gcc and some dependencies are needed. The libraries are tested on the given versions. Most of the dependencies can be installed through yum. Other dependencies should be installed through the source tarball. Typically you can use "./configure && make && make install" to install from source tarball.
Libraries that must be installed using source tarball.
Name | Version | Download URL |
---|---|---|
| 0.9 | http://oss.metaparadigm.com/json-c/json-c-0.9.tar.gz
|
|
| http://sourceforge.net/projects/boost/files/boost/1.56.0/boost_1_56_0.tar.bz2 |
|
| http://archive.apache.org/dist/thrift/0.9.1/thrift-0.9.1.tar.gz (require boost 1.56) |
|
| https://github.com/google/protobuf/tree/v2.5.0 |
|
| https://curl.haxx.se/download/curl-7.44.0.tar.gz |
maven | http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo | |
cmake | >=3.0 | https://cmake.org/files/ |
You might need to run "ldconfig -p <LIBRARY_INSTALL_PATH>" after installing them.
For thrift build, you might need "--without-tests" for configure.
Install maven:
sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
sudo yum install -y apache-maven
Install pip:
wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py
Libraries that can be installed through yum.
Name | Versoin |
---|---|
epel-release | 6-8 |
make |
|
gcc | >=4.7.2 |
gcc-c++ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1.875 |
|
|
|
|
| >2.5.4 |
lcov | 1.12 |
Default version of gcc in Redhat/Cenos 6.X is 4.4.7 or lower, you can quickly upgrade gcc following instructions below:
cd /etc/yum.repos.d # make sure you have root permission wget -O /etc/yum.repos.d/slc6-devtoolset.repo http://linuxsoft.cern.ch/cern/devtoolset/slc6-devtoolset.repo # install higher version using devtoolset-2 yum install devtoolset-2-gcc devtoolset-2-binutils devtoolset-2-gcc-c++ # Start using software collections scl enable devtoolset-2 bash
You will need to install python packages same as those which are required for Redhat/centos 7.
OS requirement
use a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value.
kernel.shmmax = 1000000000 kernel.shmmni = 4096 kernel.shmall = 4000000000 kernel.sem = 250 512000 100 2048 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.msgmni = 2048 net.ipv4.tcp_syncookies = 0 net.ipv4.ip_forward = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_max_syn_backlog = 200000 net.ipv4.conf.all.arp_filter = 1 net.ipv4.ip_local_port_range = 1281 65535 net.core.netdev_max_backlog = 200000 vm.overcommit_memory = 2 fs.nr_open = 3000000 kernel.threads-max = 798720 kernel.pid_max = 798720 # increase network net.core.rmem_max=2097152 net.core.wmem_max=2097152
- Execute the following command to apply your updated /etc/sysctl.conf file to the operating system configuration:
sysctl -p
- Use a text editor to edit the /etc/security/limits.conf file. Add the following definitions in the exact order that they are listed
* soft nofile 2900000 * hard nofile 2900000 * soft nproc 131072 * hard nproc 131072
Build with Prebuilt Docker Image
Probably the simplest way to get started with the build is starting with the community developed docker image with all the project dependencies pre-installed.
To use the docker image follow the steps on: https://hub.docker.com/r/mayjojo/hawq-devel/
Install Hadoop
Please follow the steps here: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
Note you might need to build hadoop from source on redhat/centos6.x if the downloaded hadoop package has higher
glibc version requirement. When that happens, you will probably see the warning below when running start-dfs.h.
" WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform"
You will also need to set the port for fs.defaultFS to 8020 in etc/hadoop/core-site.xml (The example above set it as 9000.)
HDFS is a must, but YARN is optional. YARN is only needed when you want to use YARN as the global resource manager.
Your need to verify your HDFS works.
# start HDFS start-dfs.sh # Do some basic tests to make sure HDFS works hadoop fs -lr / hadoop fs -mkdir /test hadoop fs -put ./testfile / hadoop fs -get /testfile .
Get the HAWQ code and Compile
Once you have an environment with the necessary dependencies installed and Hadoop is ready, the next step is to get the code and build HAWQ
# The Apache HAWQ source code can be obtained from the the following link: # Apache Repo: https://git-wip-us.apache.org/repos/asf/incubator-hawq.git or # GitHub Mirror: https://github.com/apache/incubator-hawq. git clone https://git-wip-us.apache.org/repos/asf/incubator-hawq.git # The code directory is incubator-hawq. CODE_BASE=`pwd`/incubator-hawq cd $CODE_BASE # Run command to generate makefile. ./configure # Or you could use --prefix=/hawq/install/path to change the Apache HAWQ install path, # and you can also add some optional components using options (--with-python --with-perl) # For El Capitan (Mac OS 10.11), you may need to do: export CPPFLAGS="-I/usr/local/include -L/usr/local/lib" if the configure cannot find some components ./configure --prefix=/hawq/install/path --with-python --with-perl # You can also run the command with --help for more configuration. ./configure --help # Run command to build and install # To build concurrently , run make with -j option. For example, make -j8 # On Linux system without large memory, you will probably encounter errors like # "Error occurred during initialization of VM" and/or "Could not reserve enough space for object heap" # and/or "out of memory", try to set vm.overcommit_memory = 1 temporarily, and/or avoid "-j" build, # and/or add more memory and then rebuild. # On mac os, you will probably see this error: "'openssl/ssl.h' file not found". # "brew link openssl --force" should be able to solve the issue. make -j8 # Install HAWQ make install
Init/Start/Stop HAWQ
# Before initializing HAWQ, you need to install HDFS and make sure it works. # Besides you need to set password-less ssh on the systems. source /hawq/install/path/greenplum_path.sh hawq init cluster # after initialization, HAWQ is started by default # Now you can stop/restart/start the cluster by using: hawq stop/restart/start cluster # HAWQ master and segments are completely decoupled. So you can also init, start or stop the master and segments separately. # For example, to init: hawq init master, then hawq init segment # to stop: hawq stop master, then hawq stop segment # to start: hawq start master, then hawq start segment
Connect and Run basic queries
psql -d postgres create table t ( i int ); insert into t values(1); insert into t select generate_series(1,10000); select count(*) from t;
Test HAWQ
# Unit test. To do unit test, go to the src/backend and run unittest. cd $CODE_BASE/src/backend make unittest-check # Code coverage cd $CODE_BASE ./configure --enable-coverage --enable-debug (for debug build), or ./configure --enable-coverage (for opt build) make -j8 make install run some test to exercise hawq (i.e., unit test, install check, feature test, etc) make coverage-show to see summary code coverage information in console, and detailed code coverage information in CodeCoverageReport (html format) make coverage-show filter="./src/backend/executor/nodeAgg.c -d ./src/backend/commands" to see code coverage for specific files or directories make coverage-reset to clear code coverage statistics # Installcheck-good test. After installing HAWQ, please ensure HDFS work before initializing HAWQ. source /install/dir/greenplum_path.sh hawq init cluster make installcheck-good # Feature test cd $CODE_BASE make feature-test ./feature-test to run all feature test, or ./feature-test --gtest_filter=TestCommonLib.TestSqlUtil to run test suite TestCommonLib.TestSqlUtil