The 0.12.0 release of Apache Knox had a focus on the KnoxShell module of the product. This module as been getting some uptake recently and a number of improvements were made in its security, API classes, credential collectors and even structure and packaging.The KnoxShell release artifact provides a small footprint client environment that removes all unnecessary server dependencies, configuration, binary scripts, etc. It is comprised a couple different things that empower different sorts of users.
- A set of SDK type classes for providing access to Hadoop resources over HTTP
- A Groovy based DSL for scripting access to Hadoop resources based on the underlying SDK classes
- A KnoxShell Token based Sessions to provide a CLI SSO session for executing multiple script
While testing the KnoxShell examples for the 0.14.0 Apache Knox release, I realized that using the KnoxShell for access to HiveServer2 was not easily done.
This is due to the fact that we are leveraging the knoxshell executable jar which makes is difficult to add additional classes and jars to the classpath for the executing script.
I needed to create a launch script that called the main class of the executable jar while also being able to set the classpath with additional jars for Apache Hive clients.
This article will go over the creation of a simple SQL client that we will call "knoxline" by using KnoxShell Groovy based DSL.
This particular article should work using the 0.14.0 knoxshell download and with previous gateway server releases as well.
Download
In the 0.14.0 release, you may get to the knoxshell download through the Apache Knox site.
From this above page click the Gateway client binary archive link or just use the one here.
Unzip this file into your preferred location which will result in a knoxshell-0.1214.0 directory and we will refer to that location as the {GATEWAY_HOME}.
CD {GATEWAY_HOME}
You should see something similar to the following:
home:knoxshellbash-
0.12.0 larry$ ls3.2$ ls -l
296
total160
rw
-r--r--r--@
1
larry
staff
71714 Mar 14 14:06 LICENSE -rw71714 Dec 6 18:32 LICENSE
-r--r--r--@1
larry
staff
164 Mar 14 14:06 NOTICE164 Dec 6 18:32 NOTICE
-rw-r--r--@1
larry
staff 71714 Mar 15 20:04 READMEstaff 1452 Dec 6 18:32 README
12
drwxr-xr-x@6 larry
staff 408 Mar 15 21:24 binstaff 204 Dec 14 18:06 bin
drwxr--r--@3
larry
staff
102 Mar 14 14102 Dec 14 18:06
conf
-x+ 3 larry staff 102 Mar 15 12:41 logs drwxr
drwxr-xr-
xr-x@
1819 larry
staff 612 Mar 14 14:18 samplesstaff 646 Dec 14 18:06 samples
Directory | Description |
---|---|
bin | contains the main knoxshell jar and related shell scripts |
conf | only contains log4j config |
logs | contains the knoxshell.log file |
samples | has numerous examples to help you get started |
Setup Truststore for Client
Get/setup truststore for the target Knox instance or fronting load balancer
...
NOTE: if you see errors related to SSL and PKIX your truststore is not properly setup
...
Add Hive Client Libraries
Execute the an example script from the {GATEWAY_CLIENT_HOME}/samples directory - for instance:
bin/knoxshell.sh samples/ExampleWebHdfsLs.groovy
home:knoxshell-0.12.0 larry$ bin/knoxshell.sh samples/ExampleWebHdfsLs.groovy
Enter username: guest
Enter password:
[app-logs, apps, mapred, mr-history, tmp, user]
At this point, you should have seen something similar to the above output - probably with different directories listed. You should get the idea from the above. Take a look at the sample that we ran above:
Directory | Description |
---|---|
lib | To contain external jars to add to the classpath for things like HiveDriver |
Next we will download the hive standalone client jar which will contain nearly everything we need.
For this article, we will download Hive 1.2.1 standalone jar and copy it to the newly created lib directory.
You can use whatever version client jar is appropriate for your Hive deployment.
Add Commons Logging Jar
Add Launch Script
java -Dlog4j.configuration=conf/knoxshell-log4j.properties -cp bin/knoxshell.jar:lib/* org.apache.hadoop.gateway.shell.Shell bin/hive2.groovy "$@"
/**
import groovy.json.JsonSlurper import
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.sql.DriverManager
importorg.apache.hadoop.gateway.shell.
Hadoop import org.apache.hadoop.gateway.shell.hdfs.Hdfs import org.apache.hadoop.gateway.shell.Credentials gateway = "https://localhost:8443/gateway/sandbox" credentials = new Credentials()Credentials
gatewayHost = "localhost";
gatewayPort = 8443;
trustStore = System.getProperty('user.home') + "/gateway-client-trust.jks";
trustStorePassword = "changeit";
contextPath = "gateway/sandbox/hive";if (args.length == 0) {
// accept defaults
} else if (args[0] == "?" || args[0] == "help") {
System.out.println("\nExpected arguments: {host, port, truststore, truststore-pass, context-path}\n")
System.exit(0);
} else if (args.length == 5) {
gatewayHost = args[0];
gatewayPort = args[1].toInteger();
trustStore = args[2];
trustStorePassword = args[3];
contextPath = args[4];
} else if (args.length > 0) {
System.out.println("\nERROR: Expected arguments: NONE for defaults or {host, port, truststore, truststore-pass, context-path}\n")
System.exit(1);
}connectionString = String.format( "jdbc:hive2://%s:%d/;ssl=true;sslTrustStore=%s;trustStorePassword=%s?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/%s", gatewayHost, gatewayPort, trustStore, trustStorePassword, contextPath );
credentials = new Credentials()
credentials.add("ClearInput","Enter
username:
",
"user")
.add("HiddenInput","Enter
pas"
+
"sword:
",
"pass")
username =
credentials.collect()user = credentials.get("user").string()
pass=
credentials.get("pass").string()
session = Hadoop.login( gateway, username, pass ) text = Hdfs.ls( session ).dir( "/" ).now().string json = (new JsonSlurper()).parseText( text ) println json.FileStatuses.FileStatus.pathSuffix session.shutdown()// Load Hive JDBC Driver
Class.forName( "org.apache.hive.jdbc.HiveDriver" );// Configure JDBC connection
connection = DriverManager.getConnection( connectionString, user, pass );while(1) {
def sql = System.console().readLine 'knoxline> '
if (!sql.equals("")) {
System.out.println(sql)
rs = true;
statement = connection.createStatement();
try {
if (statement.execute( sql )) {
resultSet = statement.getResultSet()
int colcount = 0
colcount = resultSet.getMetaData().getColumnCount();
row = 0
header = "| "
while ( resultSet.next() ) {
line = "| "
for (int i = 1; i <= colcount; i++) {
colvalue = resultSet.getString( i )
if (colvalue == null) colvalue = ""
colsize = colvalue.length()
headerSize = resultSet.getMetaData().getColumnLabel( i ).length()
if (headerSize > colsize) colsize = headerSize
if (row == 0) {
header += resultSet.getMetaData().getColumnLabel( i ).center(colsize) + " | ";
}
line += colvalue.center(colsize) + " | ";
}
if (row == 0) {
System.out.println("".padLeft(header.length(), "="))
System.out.println(header);
System.out.println("".padLeft(header.length(), "="))
}
System.out.println(line);
row++
}
System.out.println("\nRows: " + row + "\n");
resultSet.close();
}
}
catch(Exception e) {
e.printStackTrace()
connection = DriverManager.getConnection( connectionString, user, pass );
}
statement.close();
}
}
connection.close();Execute a SQL Commands using KnoxLine
Let's check for existing tables:
Let's create a table by loading file from the local disk of the cluster machine:
knoxline> CREATE TABLE logs(column1 string, column2 string, column3 string, column4 string, column5 string, column6 string, column7 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
Show the created table:
Show the table description:
Load the data from the samples.log file in /tmp:
knoxline> LOAD DATA LOCAL INPATH '/tmp/sample.log' OVERWRITE INTO TABLE logs
select * from logs where column2='20:11:56' and column4='[TRACE]'
Some things to note about this sample:
- the The gateway URL is hardcodeddefaults to the sandbox topology
- alternatives would be passing it as an argument to the script, using an environment variable or prompting for it with a ClearInput credential collector
- Credential credential collectors are used to gather credentials or other input from various sources. In this sample the HiddenInput and ClearInput collectors prompt the user for the input with the provided prompt text and the values are acquired by a subsequent get call with the provided name value.
- The Hadoop.login method establishes a login session of sorts which will need to be provided to the various API classes as an argument.
- the response text is easily retrieved as a string and can be parsed by the JsonSlurper or whatever you like
- standard Java classes for JDBC are used rather than the Hadoop session object used for access to the pure REST APIs
- The resultSet is rendered in the familiar table format of other command line interfaces but shows how to access it for doing whatever scripting needs you have
- Error handling is more or less non-existent in this example
I hope to bring "knoxline" to the KnoxShell module in a future release just as a simple way to do some quick queries from your KnoxShell environmentA follow up article will cover the use of the Knox Token service and related KnoxShell commands and credential collectors.