...
1. How to Build
Setup an environment with the dependencies installed
1.1 Get the code of HAWQ
Panel | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tabs Container | ||||||||||||||||||
| ||||||||||||||||||
Tabs Page | ||||||||||||||||||
| ||||||||||||||||||
Code Block | ||||||||||||||||||
| ||||||||||||||||||
brew#install Theprotobuf Apacheprotobuf-c HAWQGsasl sourceopenssl codeboost canthrift bejson-c obtainedccache fromsnappy thelibyaml thelibevent python cmake lcov brew install iproute2mac brew cask install java brew install maven sudo easy_install pip sudo pip install pycrypto brew install cpanm sudo cpanm install JSON #get libesmtp from http://linuxfromscratch.org/blfs/view/svn/general/libesmtp.html tar jxvf libesmtp-1.0.6.tar.bz2 cd libesmtp-1.0.6 ./configure && make sudo make install Please refer to section below titled Running catalog tidycat perl modules for installing perl-JSON module on MAC/
OS requirementUse a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value.kern.sysv.shmmax=2147483648 kern.sysv.shmmin=1 kern.sysv.shmmni=64 kern.sysv.shmseg=16 kern.sysv.shmall=524288 kern.maxfiles=65535 kern.maxfilesperproc=65536
Install Xcode and command line toolsAfter install/update xcode, please run ‘xcode-select --install’ to install command line tools, and then open xcode to make sure you have already installed it. MUST: Turning Off Rootless System Integrity Protection in OS X El Capitan 10.11+If not do this, you may encounter some tricky LIBRARY_PATH problems. e.g. HAWQ-513 Following below instructions: ( refer to http://osxdaily.com/2015/10/05/disable-rootless-system-integrity-protection-mac-os-x )
Tabs Page | | |||||||||||||||||
|
Name | Version | Download URL |
---|---|---|
| 0.9 | http://oss.metaparadigm.com/json-c/json-c-0.9.tar.gz
|
|
| http://sourceforge.net/projects/boost/files/boost/1.56.0/boost_1_56_0.tar.bz2 |
|
| http://archive.apache.org/dist/thrift/0.9.1/thrift-0.9.1.tar.gz (require boost 1.56) |
|
| https://github.com/google/protobuf/tree/v2.5.0 |
|
| https://curl.haxx.se/download/curl-7.44.0.tar.gz |
maven | http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo | |
cmake | >=3.0 | https://cmake.org/files/ |
You might need to run "ldconfig -p <LIBRARY_INSTALL_PATH>" after installing them.
For thrift build, you might need "--without-tests" for configure.
Install maven:
sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
sudo yum install -y apache-maven
Install pip:
wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py
pip --retries=50 --timeout=300 install pycrypto
Libraries that can be installed through yum.
Name | Version |
---|---|
epel-release | 6-8 |
make |
|
gcc | >=4.7.2 |
gcc-c++ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1.875 |
|
|
|
|
| >2.5.4 |
lcov | 1.12 |
libesmtp-devel | 1.0.4 |
perl-JSON | 2.15 |
Default version of gcc in Red Hat/CentOS 6.X is 4.4.7 or lower, you can quickly upgrade gcc following instructions below:
Code Block | ||
---|---|---|
| ||
cd /etc/yum.repos.d
# make sure you have root permission
wget -O /etc/yum.repos.d/slc6-devtoolset.repo http://linuxsoft.cern.ch/cern/devtoolset/slc6-devtoolset.repo
# install higher version using devtoolset-2
yum install devtoolset-2-gcc devtoolset-2-binutils devtoolset-2-gcc-c++
# Start using software collections
scl enable devtoolset-2 bash |
You will need to install python packages same as those which are required for Red Hat/CentOS 7.
OS requirement
use a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value.
kernel.shmmax = 1000000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 200000
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1281 65535
net.core.netdev_max_backlog = 200000
vm.overcommit_memory = 2
fs.nr_open = 3000000
kernel.threads-max = 798720
kernel.pid_max = 798720
# increase network
net.core.rmem_max=2097152
net.core.wmem_max=2097152
- Execute the following command to apply your updated /etc/sysctl.conf file to the operating system configuration:
sysctl -p
- Use a text editor to edit the /etc/security/limits.conf file. Add the following definitions in the exact order that they are listed
* soft nofile 2900000
* hard nofile 2900000
* soft nproc 131072
* hard nproc 131072
Tabs Page | ||||
---|---|---|---|---|
| ||||
Build with Prebuilt Docker ImageApache HAWQ source code contains the Dockerfiles to help developers to setup building and testing environment with docker. To use the docker image follow the steps on: https://github.com/apache/incubator-hawq/tree/master/contrib/hawq-docker |
Build optional extension modules
Extension | How to enable | Pre-build steps on Mac |
---|---|---|
PL/R | ./configure --with-r | #install R before build brew tap homebrew/science brew install r |
PL/Python | ./configure --with-python | |
PL/Java | ./configure --with-java | |
PL/PERL | ./configure --with-perl | |
pgcrypto | ./configure --with-pgcrypto --with-openssl | |
gporca | ./configure --enable-orca |
Install Hadoop
Please follow the steps here: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
Note:
- you might need to build hadoop from source on Red Hat/CentOS 6.x if the downloaded hadoop package has higher glibc version requirement. When that happens, you will probably see the warning below when running start-dfs.sh." WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform"
- You will also need to set the port for fs.defaultFS to 8020 in etc/hadoop/core-site.xml (The example above set it as 9000.)
- HDFS is a must, but YARN is optional. YARN is only needed when you want to use YARN as the global resource manager.
- must setup passphraseless ssh, otherwise there will be some problems of "hawq init cluster" in the following step.
Your need to verify your HDFS works.
Code Block | ||
---|---|---|
| ||
# start HDFS
start-dfs.sh
# Do some basic tests to make sure HDFS works
echo "test data" >> ./localfile
hadoop fs -mkdir /test
hadoop fs -put ./localfile /test
hadoop fs -ls /
hadoop fs -get /test/localfile ./hdfsfile |
...
following link:
# Apache Repo: https://git.apache.org/repos/asf/hawq.git
# GitHub Mirror: https://github.com/apache/hawq.git
# Gitee Mirror: https://gitee.com/mirrors/hawq.git
git clone https://git.apache.org/repos/asf/hawq.git |
1.2 Setup an environment with the dependencies installed
Panel | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
1.3 Compile and Install HAWQ
Once you have an environment with the necessary dependencies installed and Hadoop is ready, the next step is to get the code and build HAWQ
Code Block | ||
---|---|---|
| ||
# The Apache HAWQ source code candirectory be obtained from the the following link: # Apache Repo: https://git-wip-us.apache.org/repos/asf/incubator-hawq.git or # GitHub Mirror: https://github.com/apache/incubator-hawq. git clone https://git-wip-us.apache.org/repos/asf/incubator-hawq.git # The code directory is incubator-hawq. CODE_BASE=`pwd`/incubator-hawq cd $CODE_BASE # Run command to generate makefile. ./configure # Or you could use --prefix=/hawq/install/path to change the Apache HAWQ install path, # and you can also add some optional components using options (--with-python --with-perl) # For El Capitan (Mac OS 10.11), you may need to do: export CPPFLAGS="-I/usr/local/include -L/usr/local/lib" if the configure cannot find some components ./configure --prefix=/hawq/install/path --with-python --with-perl # You can also run the command with --help for more configuration. ./configure --help # Run command to build and install # To build concurrently , run make with -j option. For example, make -j8 # On Linux system without large memory, you will probably encounter errors like # "Error occurred during initialization of VM" and/or "Could not reserve enough space for object heap" # and/or "out of memory", try to set vm.overcommit_memory = 1 temporarily, and/or avoid "-j" build, # and/or add more memory and then rebuild. # On mac os, you will probably see this error: "'openssl/ssl.h' file not found". # "brew link openssl --force" should be able to solve the issue. make -j8 # Install HAWQ make installis hawq. CODE_BASE=`pwd`/hawq cd $CODE_BASE # Run command to generate makefile. ./configure # You can also run the command with --help for more configuration. ./configure --help # Run command to build and install # To build concurrently , run make with -j option. For example, make -j8 # On Linux system without large memory, you will probably encounter errors like # "Error occurred during initialization of VM" and/or "Could not reserve enough space for object heap" # and/or "out of memory", try to set vm.overcommit_memory = 1 temporarily, and/or avoid "-j" build, # and/or add more memory and then rebuild. # On mac os, you will probably see this error: "'openssl/ssl.h' file not found". # "brew link openssl --force" should be able to solve the issue. make -j8 # Install HAWQ make install |
2. Init/Start/Stop HAWQ
2.1 Install and Start Hadoop
Please follow the steps here: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
Note:
- you might need to build hadoop from source on Red Hat/CentOS 6.x if the downloaded hadoop package has higher glibc version requirement. When that happens, you will probably see the warning below when running start-dfs.sh." WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform"
- You will also need to set the port for fs.defaultFS to 8020 in etc/hadoop/core-site.xml (The example above set it as 9000.)
- HDFS is a must, but YARN is optional. YARN is only needed when you want to use YARN as the global resource manager.
- must setup passphraseless ssh, otherwise there will be some problems of "hawq init cluster" in the following step.
Your need to verify your HDFS works.
Code Block | ||
---|---|---|
| ||
# start HDFS
start-dfs.sh
# Do some basic tests to make sure HDFS works
echo "test data" >> ./localfile
hadoop fs -mkdir /test
hadoop fs -put ./localfile /test
hadoop fs -ls /
hadoop fs -get /test/localfile ./hdfsfile |
2.2 Init/Start/Stop HAWQ
Code Block | ||
---|---|---|
| ||
# Before initializing HAWQ, you need to install HDFS and make sure it works. source /hawq/install/path/greenplum_path.sh # Besides you need to set password-less ssh on the systems. source /hawq/install/path/greenplum_path.shneed to set password-less ssh on the systems. # Exchange SSH keys between the hosts host1, host2, and host3: hawq ssh-exkeys -h host1 -h host2 -h host3 hawq init cluster # after initialization, HAWQ is started by default # Now you can stop/restart/start the cluster by using: hawq stop/restart/start cluster # HAWQ master and segments are completely decoupled. So you can also init, start or stop the master and segments separately. # For example, to init: hawq init master, then hawq init segment # to stop: hawq stop master, then hawq stop segment # to start: hawq start master, then hawq start segment |
3. Connect and Run basic queries
Code Block | ||
---|---|---|
| ||
psql -d postgres create table t ( i int ); insert into t values(1); insert into t select generate_series(1,10000); select count(*) from t; |
4. Query external hadoop data(optional)
You will need to use PXF to query external hadoop/hive/hbase data. Refer to PXF Build & Install document.
5. Test HAWQ
Code Block | ||
---|---|---|
| ||
# Unit test. To do unit test, go to the src/backend and run unittest. cd $CODE_BASE/src/backend make unittest-check # Code coverage cd $CODE_BASE ./configure --enable-coverage --enable-debug (for debug build), or ./configure --enable-coverage (for opt build) make -j8 make install run some test to exercise hawq (i.e., unit test, install check, feature test, etc) make coverage-show to see summary code coverage information in console, and detailed code coverage information in CodeCoverageReport (html format) make coverage-show filter="./src/backend/executor/nodeAgg.c -d ./src/backend/commands" to see code coverage for specific files or directories make coverage-reset to clear code coverage statistics # Installcheck-good test. After installing HAWQ, please ensure HDFS work before initializing HAWQ. source /install/dir/greenplum_path.sh hawq init cluster make installcheck-good # Feature test cd $CODE_BASE make feature-test cd src/test/feature ./feature-test to run all feature test, or ./feature-test --gtest_filter=TestCommonLib.TestSqlUtil to run test suite TestCommonLib.TestSqlUtil |
6. Running catalog tidycat perl modules(optional)
The JSON Perl Module is required to run the set of Perl scripts (src/include/catalog). The versioned JSON formatted catalog files are stored in tools/bin/gppylib/data/<version>.json. In order to install the JSON module, the developer will need to make the module available from CPAN. The following was validated on a Macbook Pro OS X 10.11.6 using the information from the Perl on Mac OSX section (http://www.cpan.org/modules/INSTALL.html). Below you will see the session which performs the following steps:
...
Fatal Error: The required package JSON is not installed -- please download it from www.cpan.org
Code Block |
---|
00:02 $ perl tidycat.pl -dd 2.0.json -df json *.h Fatal Error: The required package JSON is not installed -- please download it from www.cpan.org 00:02 $ 00:02 $ cpan install JSON [many output stuff....] 00:05 $ 00:05 $ perl tidycat.pl -dd foo.json -df json *.h Fatal Error: The required package JSON is not installed -- please download it from www.cpan.org 00:05 $ from www.cpan.org 00:05 $ 00:05 $ PATH="/Users/espino/perl5/bin${PATH:+:${PATH}}"; export PATH; 00:05 $ PERL5LIB="/Users/espino/perl5/lib/perl5${PERL5LIB:+:${PERL5LIB}}"; export PERL5LIB; 00:05 $ PATHPERL_LOCAL_LIB_ROOT="/Users/espino/perl5/bin${PATHperl5${PERL_LOCAL_LIB_ROOT:+:${PATHPERL_LOCAL_LIB_ROOT}}"; export PATHPERL_LOCAL_LIB_ROOT; 00:05 $ PERL5LIBPERL_MB_OPT="--install_base \"/Users/espino/perl5/lib/perl5${PERL5LIB:+:${PERL5LIB}}/perl5\""; export PERL5LIBPERL_MB_OPT; 00:05 $ PERL_LOCAL_LIB_ROOT="MM_OPT="INSTALL_BASE=/Users/espino/perl5${perl5"; export PERL_LOCAL_LIB_ROOT:+:${PERL_LOCAL_LIB_ROOT}}"; export PERL_LOCAL_LIB_ROOT;MM_OPT; 00:05 $ 00:05 $ perl tidycat.pl -dd 2.0.json -df json *.h 00:05 $ PERL_MB_OPT="--install_base \"/Users/espino/perl5\""; export PERL_MB_OPT; 00:05 $ PERL_MM_OPT="INSTALL_BASE=/Users/espino/perl5"; export PERL_MM_OPT; 00:05 $ 00:05 $ perl tidycat.pl -dd 2.0.json -df json *.h 00:05 $ |
7. Build optional extension modules(optional)
Extension | How to enable | Pre-build steps on Mac |
---|---|---|
PL/R | ./configure --with-r | #install R before build brew tap homebrew/science brew install r |
PL/Python | ./configure --with-python | |
PL/Java | ./configure --with-java | |
PL/PERL | ./configure --with-perl | |
pgcrypto | ./configure --with-pgcrypto --with-openssl | |
gporca | ./configure --enable-orca | |
rps | ./configure --enable-rps | brew install tomcat@6 |