How to Build
PXF uses Gradle. For simplicity we've encapsulated it with simple make commands. Make sure Java is installed prior to building PXF.
git clone https://git-wip-us.apache.org/repos/asf/incubator-hawq.git # Head to PXF code cd incubator-hawq/pxf # Compile & Test PXF make # Simply Run unittest make unittest
Setup Prerequisites
Setup HAWQ and Hadoop
PXF requires HAWQ and Hadoop. Please follow the steps here to Setup HAWQ and refer to Install Hadoop section to setup Hadoop.
Setup Hive
Hive needs to be installed only if you wish to run HAWQ against Hive tables
Download hive release from one of the Apache download mirrors http://www.apache.org/dyn/closer.cgi/hive/
# Extract Hive tarbal tar -xzvf apache-hive-x.y.z-bin.tar.gz # Set Hadoop path in $HADOOP_HOME to your hadoop deployment location # Set Hive path cd apache-hive-x.y.z-bin export HIVE_HOME=$PWD # Create HDFS temp directory $HADOOP_HOME/bin/hadoop fs -mkdir /tmp $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp # Create Hive Warehouse $HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse # Set Metastore URI in $HIVE_HOME/conf/hive-site.xml <configuration> <property> <name>hive.metastore.uris</name> <value>thrift://localhost:9083</value> </property> </configuration> # Use Hive CLI/Shell $HIVE_HOME/bin/hive hive> CREATE TABLE hivetest (id INT, name STRING); hive> SHOW TABLES; hive> DESCRIBE hivetest;
Install PXF
cd $CODE_BASE/pxf # Set PXF home directory export PXF_HOME=$GPHOME/pxf # Install PXF make install # This would create the necessary artifacts under PXF_HOME
Configure PXF
You will see the PXF configuration files in $PXF_HOME/conf
Update the following files based on your environment and hadoop directly layout.
- pxf-env.sh
- Set LD_LIBRARY_PATH to ${HADOOP_HOME}/lib/native
- Set PXF_LOGDIR to ${PXF_HOME}/logs
- Set PXF_RUNDIR to ${PXF_HOME}
- Set PXF_USER to your username
- pxf-log4j.properties
- Set log4j.appender.ROLLINGFILE.File to the expanded path of $PXF_HOME/logs/pxf-service.log. (Don't use the environment variable in this file)
- pxf-private.classpath
- Update the library and configuration paths of hadoop,hive,pxf, etc. Use only absolute paths without referring to environment variables
- Update the library and configuration paths of hadoop,hive,pxf, etc. Use only absolute paths without referring to environment variables
Init/Start/Stop PXF
# Deploy PXF $PXF_HOME/bin/pxf init # Start PXF $PXF_HOME/bin/pxf start # Check Status $PXF_HOME/bin/pxf status # You can also check if the service is running by using the following request to check API version curl "localhost:51200/pxf/ProtocolVersion" # To stop PXF $PXF_HOME/bin/pxf stop
Test PXF
# Create an HDFS directory for PXF example data files $HADOOP_HOME fs -mkdir -p /data/pxf_examples # Create a delimited plain text data file named pxf_hdfs_simple.txt: $ echo 'Prague,Jan,101,4875.33 Rome,Mar,87,1557.39 Bangalore,May,317,8936.99 Beijing,Jul,411,11600.67' > /tmp/pxf_hdfs_simple.txt # Add the data file to HDFS: $ hdfs dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/ #Display the contents of the pxf_hdfs_simple.txt file stored in HDFS: $ hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
Now you can access the hdfs file from HAWQ using the HdfsTextSimple profile as shown below.
gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8) LOCATION ('pxf://localhost:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple') FORMAT 'TEXT' (delimiter=E','); gpadmin=# SELECT * FROM pxf_hdfs_textsimple; location | month | num_orders | total_sales ---------------+-------+------------+------------- Prague | Jan | 101 | 4875.33 Rome | Mar | 87 | 1557.39 Bangalore | May | 317 | 8936.99 Beijing | Jul | 411 | 11600.67 (4 rows)