You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

How to Build

PXF uses Gradle. For simplicity we've encapsulated it with simple make commands. Make sure Java is installed prior to building PXF.

# Clone hawq repository if you haven't previously done
git clone https://git-wip-us.apache.org/repos/asf/incubator-hawq.git
 
# Head to PXF code
cd incubator-hawq/pxf
 
# Compile & Test PXF
make
 
# Simply Run unittest
make unittest

Setup Prerequisites

Setup HAWQ and Hadoop

PXF requires HAWQ and Hadoop. Please follow the steps here to Setup HAWQ and refer to Install Hadoop section to setup Hadoop.

Setup Hive

Hive needs to be installed only if you wish to run HAWQ against Hive tables

Download hive release from one of the Apache download mirrors http://www.apache.org/dyn/closer.cgi/hive/

# Extract Hive tarbal
tar -xzvf apache-hive-x.y.z-bin.tar.gz
 
# Set Hadoop path in $HADOOP_HOME to your hadoop deployment location
 
# Set Hive path
cd apache-hive-x.y.z-bin
export HIVE_HOME=$PWD
 
# Create HDFS temp directory
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp

# Create Hive Warehouse
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
 
# Set Metastore URI in $HIVE_HOME/conf/hive-site.xml
<configuration>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://localhost:9083</value>
    </property>
</configuration>

# Start Hive Metastore as a background process
$HIVE_HOME/bin/hive --service metastore &
# Use Hive CLI/Shell
$HIVE_HOME/bin/hive
hive> CREATE TABLE hivetest (id INT, name STRING);
hive> SHOW TABLES;
hive> DESCRIBE hivetest;

Install PXF

cd $CODE_BASE/pxf
 
# Set PXF home directory
export PXF_HOME=$GPHOME/pxf
 
# Install PXF
make install
# This would create the necessary artifacts under PXF_HOME
 

Configure PXF

You will see the PXF configuration files in $PXF_HOME/conf

Update the following files based on your environment and hadoop directly layout.

  1. pxf-env.sh
    1. Set LD_LIBRARY_PATH to ${HADOOP_HOME}/lib/native
    2. Set PXF_LOGDIR to ${PXF_HOME}/logs
    3. Set PXF_RUNDIR to ${PXF_HOME}
    4. Set PXF_USER to your username
  2. pxf-log4j.properties
    1. Set log4j.appender.ROLLINGFILE.File to the expanded path of $PXF_HOME/logs/pxf-service.log. (Don't use the environment variable in this file)
  3. pxf-private.classpath
    1. Update the library and configuration paths of hadoop,hive,pxf, etc. Use only absolute paths without referring to environment variables


Init/Start/Stop PXF

 

# Deploy PXF
$PXF_HOME/bin/pxf init
 
# Create PXF Log Dir
mkdir $PXF_HOME/logs
 
# Start PXF
$PXF_HOME/bin/pxf start
 
# Check Status
$PXF_HOME/bin/pxf status
# You can also check if the service is running by using the following request to check API version 
curl "localhost:51200/pxf/ProtocolVersion"
 
# To stop PXF $PXF_HOME/bin/pxf stop
 

Test PXF

Below are steps which demonstrates accessing a HDFS file from HAWQ.

# Create an HDFS directory for PXF example data files
$HADOOP_HOME/bin/hadoop fs -mkdir -p /data/pxf_examples
$HADOOP_HOME/bin/hadoop fs -mkdir -p /data/pxf_examples
 
# Create a delimited plain text data file named pxf_hdfs_simple.txt:
echo 'Prague,Jan,101,4875.33 Rome,Mar,87,1557.39 Bangalore,May,317,8936.99 Beijing,Jul,411,11600.67' > /tmp/pxf_hdfs_simple.txt

# Add the data file to HDFS:
$HADOOP_HOME/bin/hadoop -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/

#Display the contents of the pxf_hdfs_simple.txt file stored in HDFS:
$HADOOP_HOME/bin/hadoop -cat /data/pxf_examples/pxf_hdfs_simple.txt

Now you can access the hdfs file from HAWQ using the HdfsTextSimple profile as shown below.

postgres=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
            LOCATION ('pxf://localhost:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
          FORMAT 'TEXT' (delimiter=E',');
postgres=# SELECT * FROM pxf_hdfs_textsimple;          

   location    | month | num_orders | total_sales 
---------------+-------+------------+-------------
 Prague        | Jan   |        101 |     4875.33
 Rome          | Mar   |         87 |     1557.39
 Bangalore     | May   |        317 |     8936.99
 Beijing       | Jul   |        411 |    11600.67
(4 rows)

Below are steps which demonstrates accessing a Hive table from HAWQ

# Create a Hive table to expose our sample data set.
hive> CREATE TABLE sales_info (location string, month string,
        number_of_orders int, total_sales double)
        ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
        STORED AS textfile;

# Load the pxf_hive_datafile.txt sample data file into the sales_info table you just created:
hive> LOAD DATA LOCAL INPATH '/tmp/pxf_hive_datafile.txt'
        INTO TABLE sales_info;

# Perform a query from hive on sales_info to verify that the data was loaded successfully:
hive> SELECT * FROM sales_info;

# Query the table from HAWQ to access the hive table
postgres=# SELECT * FROM hcatalog.default.sales_info

   location    | month | num_orders | total_sales
---------------+-------+------------+-------------
 Prague        | Jan   |        101 |     4875.33
 Rome          | Mar   |         87 |     1557.39
 Bangalore     | May   |        317 |     8936.99
 ...
  • No labels