...
Code Block |
---|
|
# Clone hawq repository if you haven't previously done
git clone https://git-wip-us.apache.org/repos/asf/incubator-hawq.git
# Head to PXF code
cd incubator-hawq/pxf
# Compile & Test PXF
make
# Simply Run unittest
make unittest |
...
Init/Start/Stop PXF
Code Block |
---|
|
# Deploy PXF
$PXF_HOME/bin/pxf init
# If you get an error "WARNING: instance already exists in ..." make sure you clean up pxf-service directory under $PXF_HOME/bin/pxf and rerun init
# Create PXF Log Dir
mkdir $PXF_HOME/logs
# Start PXF
$PXF_HOME/bin/pxf start
# Check Status
$PXF_HOME/bin/pxf status
# You can also check if the service is running by using the following request to check API version
curl "localhost:51200/pxf/ProtocolVersion"
# To stop PXF $PXF_HOME/bin/pxf stop
## Note: If you see a failure
|
Test PXF
Below are steps which demonstrates accessing a HDFS file from HAWQ.
Code Block |
---|
|
# Create an HDFS directory for PXF example data files
$HADOOP_HOME/bin/hadoop fs -mkdir -p /data/pxf_examples
# Create a delimited plain text data file named pxf_hdfs_simple.txt:
echo 'Prague,Jan,101,4875.33' > /tmp/pxf_hdfs_simple.txt
echo 'Rome,Mar,87,1557.39' >> /tmp/pxf_hdfs_simple.txt
echo 'Bangalore,May,317,8936.99' >> /tmp/pxf_hdfs_simple.txt
echo 'Beijing,Jul,411,11600.67' >> /tmp/pxf_hdfs_simple.txt
# Add the data file to HDFS:
$HADOOP_HOME/bin/hadoop fs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/
#Display the contents of the pxf_hdfs_simple.txt file stored in HDFS:
$HADOOP_HOME/bin/hadoop fs -cat /data/pxf_examples/pxf_hdfs_simple.txt |
Now you can access the hdfs file from HAWQ using the HdfsTextSimple profile as shown below.
Code Block |
---|
|
postgres=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
LOCATION ('pxf://localhost:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
FORMAT 'TEXT' (delimiter=E',');
postgres=# SELECT * FROM pxf_hdfs_textsimple;
location | month | num_orders | total_sales
---------------+-------+------------+-------------
Prague | Jan | 101 | 4875.33
Rome | Mar | 87 | 1557.39
Bangalore | May | 317 | 8936.99
Beijing | Jul | 411 | 11600.67
(4 rows) |
Below are steps which demonstrates accessing a Hive table from HAWQ
Code Block |
---|
|
# Create a Hive table to expose our sample data set.
hive> CREATE TABLE sales_info (location string, month string,
number_of_orders int, total_sales double)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS textfile;
# Load the pxf_hive_datafile.txt sample data file into the sales_info table you just created:
hive> LOAD DATA LOCAL INPATH '/tmp/pxf_hive_datafile.txt'
INTO TABLE sales_info;
# Perform a query from hive on sales_info to verify that the data was loaded successfully:
hive> SELECT * FROM sales_info;
# Query the table from HAWQ to access the hive table
postgres=# SELECT * FROM hcatalog.default.sales_info
location | month | num_orders | total_sales
---------------+-------+------------+-------------
Prague | Jan | 101 | 4875.33
Rome | Mar | 87 | 1557.39
Bangalore | May | 317 | 8936.99
... |
Build PXF for other databases
PXF can be deployed to different environments, for different databases. Thus it's convenient to tailor PXF build for some specific default configuration parameters, such as - default PXF user, default log and run directories.
All supported databases are stored in incubator- hawq/pxf/gradle/profiles. By default, HAWQ databases is used.
To build PXF bundle for GPDB:
Code Block |
---|
|
make install DATABASE=gpdb |