You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

How to Build

PXF uses Gradle. For simplicity we've encapsulated it with simple make commands. Make sure Java is installed prior to building PXF.

git clone https://git-wip-us.apache.org/repos/asf/incubator-hawq.git
 
# Head to PXF code
cd incubator-hawq/pxf
 
# Compile & Test PXF
make
 
# Simply Run unittest
make unittest

Setup Prerequisites

Setup HAWQ and Hadoop

PXF requires HAWQ and Hadoop. Please follow the steps here to Setup HAWQ and refer to Install Hadoop section to setup Hadoop.

Setup Hive

Hive needs to be installed only if you wish to run HAWQ against Hive tables

Download hive release from one of the Apache download mirrors http://www.apache.org/dyn/closer.cgi/hive/

# Extract Hive tarbal
tar -xzvf apache-hive-x.y.z-bin.tar.gz
 
# Set Hadoop path in $HADOOP_HOME to your hadoop deployment location
 
# Set Hive path
cd apache-hive-x.y.z-bin
export HIVE_HOME=$PWD
 
# Create HDFS temp directory
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp

# Create Hive Warehouse
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
 
# Set Metastore URI in $HIVE_HOME/conf/hive-site.xml
<configuration>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://localhost:9083</value>
    </property>
</configuration>

# Use Hive CLI/Shell
$HIVE_HOME/bin/hive
hive> CREATE TABLE hivetest (id INT, name STRING);
hive> SHOW TABLES;
hive> DESCRIBE hivetest;

Install PXF

cd $CODE_BASE/pxf
 
# Set PXF home directory
export PXF_HOME=$GPHOME/pxf
 
# Install PXF
make install
# This would create the necessary artifacts under PXF_HOME
 

Configure PXF

You will see the PXF configuration files in $PXF_HOME/conf

Update the following files based on your environment and hadoop directly layout.

  1. pxf-env.sh
    1. Set LD_LIBRARY_PATH to ${HADOOP_HOME}/lib/native
    2. Set PXF_LOGDIR to ${PXF_HOME}/logs
    3. Set PXF_RUNDIR to ${PXF_HOME}
    4. Set PXF_USER to your username
  2. pxf-log4j.properties
    1. Set log4j.appender.ROLLINGFILE.File to the expanded path of $PXF_HOME/logs/pxf-service.log. (Don't use the environment variable in this file)
  3. pxf-private.classpath
    1. Update the library and configuration paths of hadoop,hive,pxf, etc. Use only absolute paths without referring to environment variables


Init/Start/Stop PXF

 

# Deploy PXF
$PXF_HOME/bin/pxf init
 
# Start PXF
$PXF_HOME/bin/pxf start
 
# Check Status
$PXF_HOME/bin/pxf status
# You can also check if the service is running by using the following request to check API version 
curl "localhost:51200/pxf/ProtocolVersion"
 
# To stop PXF $PXF_HOME/bin/pxf stop
 

Test PXF

# Create an HDFS directory for PXF example data files
$HADOOP_HOME fs -mkdir -p /data/pxf_examples
 
# Create a delimited plain text data file named pxf_hdfs_simple.txt:
$ echo 'Prague,Jan,101,4875.33 Rome,Mar,87,1557.39 Bangalore,May,317,8936.99 Beijing,Jul,411,11600.67' > /tmp/pxf_hdfs_simple.txt

# Add the data file to HDFS:
$ hdfs dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/

#Display the contents of the pxf_hdfs_simple.txt file stored in HDFS:
$ hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt

Now you can access the hdfs file from HAWQ using the HdfsTextSimple profile as shown below.

gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
            LOCATION ('pxf://localhost:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
          FORMAT 'TEXT' (delimiter=E',');
gpadmin=# SELECT * FROM pxf_hdfs_textsimple;          

   location    | month | num_orders | total_sales 
---------------+-------+------------+-------------
 Prague        | Jan   |        101 |     4875.33
 Rome          | Mar   |         87 |     1557.39
 Bangalore     | May   |        317 |     8936.99
 Beijing       | Jul   |        411 |    11600.67
(4 rows)
  • No labels