System configuration
On Linux
On MAC
How to Build
Setup an environment with the dependencies installed
OS requirement
- use a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value.
kern.sysv.shmmax=2147483648
kern.sysv.shmmin=1
kern.sysv.shmmni=64
kern.sysv.shmseg=16
kern.sysv.shmall=524288
kern.maxfiles=65535
kern.maxfilesperproc=65536
- restart to apply the change
Install dependencies on MAC (with xcode installed)
brew install protobuf protobuf-c Gsasl openssl boost thrift json-c ccache snappy libyaml libevent python brew tap brona/iproute2mac brew install iproute2mac brew install postgresql sudo pip install pygresql sudo pip install unittest2 pycrypto lockfile paramiko psi sudo pip install http://sourceforge.net/projects/pychecker/files/pychecker/0.8.19/pychecker-0.8.19.tar.gz/download brew uninstall postgresql Install libhdfs3: git clone https://github.com/Pivotal-DataFabric/libhdfs3 cd libhdfs3 mkdir build cd build ../bootstrap --prefix=/usr/local/ make -j8 make install
Install dependencies on CentOS7.x
OS requirement
use a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value.
kernel.shmmax = 1000000000 kernel.shmmni = 4096 kernel.shmall = 4000000000 kernel.sem = 250 512000 100 2048 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.msgmni = 2048 net.ipv4.tcp_syncookies = 0 net.ipv4.ip_forward = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_max_syn_backlog = 200000 net.ipv4.conf.all.arp_filter = 1 net.ipv4.ip_local_port_range = 1281 65535 net.core.netdev_max_backlog = 200000 vm.overcommit_memory = 2 fs.nr_open = 3000000 kernel.threads-max = 798720 kernel.pid_max = 798720 # increase network net.core.rmem_max=2097152 net.core.wmem_max=2097152
- Execute the following command to apply your updated /etc/sysctl.conf file to the operating system configuration:
sysctl -p
- Use a text editor to edit the /etc/security/limits.conf file. Add the following definitions in the exact order that they are listed
* soft nofile 2900000 * hard nofile 2900000 * soft nproc 131072 * hard nproc 131072
Dependencies
(CentOS7 user can follow easy steps provided by Zhanwei Wang)
curl -L "https://bintray.com/wangzw/rpm/rpm" -o /etc/yum.repos.d/bintray-wangzw-rpm.repo yum install -y epel-release yum makecache yum install -y man passwd sudo tar which git mlocate links make bzip2 net-tools \ autoconf automake libtool m4 gcc gcc-c++ gdb bison flex cmake gperf maven indent \ libuuid-devel krb5-devel libgsasl-devel expat-devel libxml2-devel \ perl-ExtUtils-Embed pam-devel python-devel libcurl-devel snappy-devel \ thrift-devel libyaml-devel libevent-devel bzip2-devel openssl-devel \ openldap-devel protobuf-devel readline-devel net-snmp-devel apr-devel \ libesmtp-devel xerces-c-devel python-pip json-c-devel libhdfs3-devel \ apache-ivy java-1.7.0-openjdk-devel \ openssh-clients openssh-server yum install -y postgresql-devel pip --retries=50 --timeout=300 install pg8000 simplejson unittest2 pycrypto pygresql pyyaml lockfile paramiko psi pip --retries=50 --timeout=300 install http://darcs.idyll.org/~t/projects/figleaf-0.6.1.tar.gz pip --retries=50 --timeout=300 install http://sourceforge.net/projects/pychecker/files/pychecker/0.8.19/pychecker-0.8.19.tar.gz/download yum erase -y postgresql postgresql-libs postgresql-devel
Build dependencies yourself ( tested on redhat 6.x).
OS requirement
use a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value.
kernel.shmmax = 1000000000 kernel.shmmni = 4096 kernel.shmall = 4000000000 kernel.sem = 250 512000 100 2048 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.msgmni = 2048 net.ipv4.tcp_syncookies = 0 net.ipv4.ip_forward = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_max_syn_backlog = 200000 net.ipv4.conf.all.arp_filter = 1 net.ipv4.ip_local_port_range = 1281 65535 net.core.netdev_max_backlog = 200000 vm.overcommit_memory = 2 fs.nr_open = 3000000 kernel.threads-max = 798720 kernel.pid_max = 798720 # increase network net.core.rmem_max=2097152 net.core.wmem_max=2097152
- Execute the following command to apply your updated /etc/sysctl.conf file to the operating system configuration:
sysctl -p
- Use a text editor to edit the /etc/security/limits.conf file. Add the following definitions in the exact order that they are listed
* soft nofile 2900000 * hard nofile 2900000 * soft nproc 131072 * hard nproc 131072
Dependencies
There are several dependencies (see the following table) you must install before building HAWQ. To build Apache HAWQ, gcc and some dependencies are needed. The libraries are tested on the given versions. Most of the dependencies can be installed through yum. Other dependencies should be installed through the source tarball. Typically you can use "./configure && make && make install" to install from source tarball.
Libraries that must be installed using source tarball.
Name | Version | Download URL |
---|---|---|
| 0.9 | |
|
| http://sourceforge.net/projects/boost/files/boost/1.56.0/boost_1_56_0.tar.bz2 |
|
| http://archive.apache.org/dist/thrift/0.9.1/thrift-0.9.1.tar.gz |
|
| https://github.com/google/protobuf/tree/v2.5.0 |
|
| http://www.curl.haxx.se/download/curl-7.44.0.tar.gz |
|
| https://github.com/PivotalRD/libhdfs3.git |
Libraries that can be installed through yum.
Name | Versoin |
---|---|
epel-release | 6-8 |
make |
|
gcc |
|
gcc-c++ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Build with Prebuilt Docker Image
Probably the simplest way to get started with the build is starting with the community developed docker image with all the project dependencies pre-installed.
To use the docker image follow the steps on: https://hub.docker.com/r/mayjojo/hawq-devel/
Get the code and Compile
Once you have an environment with the necessary dependencies installed, the next step is to get the code and build HAWQ
- The Apache HAWQ source code can be obtained from the GitHub: https://github.com/apache/incubator-hawq.
- Get source code
git clone https://git-wip-us.apache.org/repos/asf/incubator-hawq.git
- The code directory is CODEHOME/incubator-hawq. Then cd CODEHOME/incubator-hawq and build Apache HAWQ under this directory.
install libyarn
cd /CODEHOME/incubator-hawq/depends/libyarn
mkdir build
cd build
../bootstrap --prefix=/usr/local/
make -j8
make install
- Run command to generate makefile.
./configure
- Or you could use --prefix=/hawq/install/path to change the Apache HAWQ install path.
./configure --prefix=/hawq/install/path
- You can also run the command with --help for more configuration.
./configure --help - Note: If
./configure
complains that libyarn is missing, it is provided under ./depends/libyarn. Please follow the README file to install libyarn. You may need to runldconfig
after libyarn is installed. - Run command to build and install
make
- To build concurrently , run make with -j option.
make -j8
Install HAWQ
- To install Apache HAWQ, run command
make install
Test In HAWQ
- Unit test. To do unit test, go to the src/backend and run unittest.
cd src/backend
make unittest-check
- Installcheck-good test. After installing HAWQ, please ensure HDFS work before initializing HAWQ.
source /install/dir/greenplum_path.sh
hawq init cluster
make installcheck-good
Install YARN (Optional)
If you want to integrate with YARN for resource management, you need to install YARN first.
Init and Start/Stop Apache HAWQ
- Before initializing HAWQ, you need to install HDFS and make sure it works
- source /install/dir/greenplum_path.sh
- hawq init cluster (after initialization, HAWQ is started by default)
- Now you can stop/restart/start the cluster by using: hawq stop/restart/start cluster
Connect and Run basic queries
- psql -d postgres
- create table t ( i int );
- insert into t values(1);
- select * from t;