Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
# Clone hawq repository if you haven't previously done
git clone https://git-wip-us.apache.org/repos/asf/incubator-hawq.git
 
# Head to PXF code
cd incubator-hawq/pxf
 
# Compile & Test PXF
make
 
# Simply Run unittest
make unittest

Setup Prerequisites

Setup HAWQ and Hadoop

PXF requires HAWQ and Hadoop. Please follow the steps here to Setup HAWQ and refer to Install Hadoop section to setup Hadoop.

...

If you simply need to test PXF, you can do so using the Demo profile which doesn't require any requisites as all it does is test against static data from PXF.

Setup Hadoop

Please follow the steps here: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

Note:

  • you might need to build hadoop from source on Red Hat/CentOS 6.x if the downloaded hadoop package has higher glibc version requirement. When that happens, you will probably see the warning below when running start-dfs.sh." WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform"
  • You will also need to set  the port for fs.defaultFS to 8020 in etc/hadoop/core-site.xml (The example above set it as 9000.)
  • HDFS is a must, but YARN is optional. YARN is only needed when you want to use YARN as the global resource manager.
  • must setup passphraseless ssh, otherwise there will be some problems of "hawq init cluster" in the following step.

Your need to verify your HDFS works.

Code Block
languagebash
# start HDFS
start-dfs.sh
 
# Do some basic tests to make sure HDFS works
echo "test data" >> ./localfile
hadoop fs -mkdir /test
hadoop fs -put ./localfile /test
hadoop fs -ls /
hadoop fs -get /test/localfile ./hdfsfile

Setup HAWQ (If you use HAWQ)

Please follow the steps here to Setup HAWQ

Setup Hive (If you need Hive)

Hive needs to be installed only if you wish to run HAWQ against Hive tables

...

Code Block
languagebash
cd $CODE_BASE/pxf
 
# Set PXF home directory.
# GPHOME must have been set previously as per the HAWQ Build instructions
export PXF_HOME=$GPHOME/pxf
 
# Install PXF If you wish to install PXF for GPDB, please refer to Build PXF for other databases section below instead.
make install
# This would create the necessary artifacts under PXF_HOME
 

PXF RPM (Optional)

If you prefer using the rpm approach of installing PXF on an environment that has the required rpm dependancies installed, you can do the following

Code Block
# Create rpm artifacts
make rpm
# Install apache tomcat rpm
rpm -ivh distribution/apache-tomcat-*.rpm
# Install the pxf rpms
rpm -ivh build/distribution/pxf*

 

Configure PXF

You will see the PXF configuration files in $PXF_HOME/conf

...


Init/Start/Stop PXF

 

Code Block
languagebash
# Deploy PXF
$PXF_HOME/bin/pxf init
# If you get an error "WARNING: instance already exists in ..." make sure you clean up pxf-service directory under $PXF_HOME/bin/pxf and rerun init
 
# Create PXF Log Dir
mkdir $PXF_HOME/logs
 
# Start PXF
$PXF_HOME/bin/pxf start
 
# Check Status
$PXF_HOME/bin/pxf status
# You can also check if the service is running by using the following request to check API version 
curl "localhost:51200/pxf/ProtocolVersion"
 
# To stop PXF $PXF_HOME/bin/pxf stop
 
## Note: If you see a failure 
 

Test PXF

Below are steps which demonstrates accessing a HDFS file from HAWQ.

Code Block
languagebash
# Create an HDFS directory for PXF example data files
$HADOOP_HOME/bin/hadoop fs -mkdir -p /data/pxf_examples
$HADOOP_HOME/bin/hadoop fs -mkdir -p /data/pxf_examples
 
# Create a delimited plain text data file named pxf_hdfs_simple.txt:
echo 'Prague,Jan,101,4875.33' > /tmp/pxf_hdfs_simple.txt
echo 'Rome,Mar,87,1557.39' >> /tmp/pxf_hdfs_simple.txt
echo 'Bangalore,May,317,8936.99' >> /tmp/pxf_hdfs_simple.txt
echo 'Beijing,Jul,411,11600.67' >>> /tmp/pxf_hdfs_simple.txt

# Add the data file to HDFS:
$HADOOP_HOME/bin/hadoop fs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/

#Display the contents of the pxf_hdfs_simple.txt file stored in HDFS:
$HADOOP_HOME/bin/hadoop fs -cat /data/pxf_examples/pxf_hdfs_simple.txt

Now you can access the hdfs file from HAWQ using the HdfsTextSimple profile as shown below.

Code Block
languagesql
postgres=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
            LOCATION ('pxf://localhost:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
          FORMAT 'TEXT' (delimiter=E',');
postgres=# SELECT * FROM pxf_hdfs_textsimple;          

   location    | month | num_orders | total_sales 
---------------+-------+------------+-------------
 Prague        | Jan   |        101 |     4875.33
 Rome          | Mar   |         87 |     1557.39
 Bangalore     | May   |        317 |     8936.99
 Beijing       | Jul   |        411 |    11600.67
(4 rows)

Below are steps which demonstrates accessing a Hive table from HAWQ

Code Block
languagebash
# Create a Hive table to expose our sample data set.
hive> CREATE TABLE sales_info (location string, month string,
        number_of_orders int, total_sales double)
        ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
        STORED AS textfile;

# Load the pxf_hive_datafile.txt sample data file into the sales_info table you just created:
hive> LOAD DATA LOCAL INPATH '/tmp/pxf_hive_datafile.txt'
        INTO TABLE sales_info;

# Perform a query from hive on sales_info to verify that the data was loaded successfully:
hive> SELECT * FROM sales_info;

# Query the table from HAWQ to access the hive table
postgres=# SELECT * FROM hcatalog.default.sales_info

   location    | month | num_orders | total_sales
---------------+-------+------------+-------------
 Prague        | Jan   |        101 |     4875.33
 Rome          | Mar   |         87 |     1557.39
 Bangalore     | May   |        317 |     8936.99
 ...

Build PXF for other databases

PXF can be deployed to different environments, for different databases. Thus it's convenient to tailor PXF build for some specific default configuration parameters, such as - default PXF user, default log and run directories.

All supported databases are stored in hawq/pxf/gradle/profiles. By default, HAWQ databases is used.

To build PXF bundle for GPDB:

Code Block
languagebash
make install DATABASE=gpdb