Building from sources
Checkout sources and switch to sqoop2 branch:
$ git clone https://git-wip-us.apache.org/repos/asf/sqoop.git sqoop2 $ cd sqoop2 $ git checkout sqoop2
Then you can build sqoop using mvn:
$ mvn package
Optionally you can build Sqoop with skipping tests:
$ mvn package -DskipTests
Creating binaries
Now build and package Sqoop2 as distribution:
$ mvn package
This process will create a directory and a tarball under dist/target
directory. The directory (named sqoop-2.0.0-SNAPSHOT
as of this writing) contains necessary binaries to run Sqoop2, and its structure looks something like
--+ bin --+ sqoop.sh | + client --+ lib --+ sqoop-common.jar | | | + sqoop-client.jar | | | + (3rd-party client dependency jars) | + server --+ bin --+ setenv.sh | | | + conf --+ sqoop_bootstrap.properties | | | | | + sqoop.properties | | | + webapps --+ ROOT | | | + sqoop.war | + ...
As part of this process, a copy of the Tomcat server is also downloaded and put under the server
directory in the above structure.
Installing Sqoop2 on remote server
To install generated binaries on remote server simply copy directory sqoop-2.0.0-SNAPSHOT
to your remote server:
scp -r dist/target/sqoop-2.0.0-SNAPSHOT remote-server.company.org:/remote/path/
Install dependencies
Sqoop server is depending on hadoop binaries, but they are not part of the distribution and thus you need to install them into Sqoop server manually. We currently supports only version 2.0, but other version will be added later. To install hadoop libraries execute command addtowar.sh
with argument -hadoop $version $location
. Following example is for Cloudera distribution version 4(CDH4):
./bin/addtowar.sh -hadoop 2.0 /usr/lib/hadoop/client/
In case that you're running original Mapreduce implementation (MR1), you will also need to install it's jar:
./bin/addtowar.sh -jars /usr/lib/hadoop-0.20-mapreduce/hadoop-2.0.0-mr1-cdh4.1.1-core.jar
You can install any arbitrary jars (connectors, JDBC drivers) using -jars
argument that takes list of jars separated by ":". Here is example for installing MySQL jdbc driver into Sqoop server:
./bin/addtowar.sh -jars /path/to/jar/mysql-connector-java-5.1.21-bin.jar
Starting/Stopping Sqoop2 server
To start Sqoop2 server invoke the sqoop
shell script:
cd dist/target/sqoop-2.0.0-SNAPSHOT bin/sqoop.sh server start
The Sqoop2 server is then running as a web application within the Tomcat server.
Similarly, to stop Sqoop2 server, do the following:
bin/sqoop.sh server stop
Starting/Running Sqoop2 client
To start an interactive shell,
bin/sqoop.sh client
This will bring up an interactive client ready for input commands:
Sqoop Shell: Type 'help' or '\h' for help. sqoop:000>
The command for the shell client looks something like <command> <function> <options>:
- set
- set server
- set server --host <host>
- set server --port <port>
- set server --webapp <webapp>
- set server
- show
- show version
- show version --all
- show version --server
- show version --client
- show version --protocol
- show version
Type "help" for getting list of all possible command line commands.
Modifying configuration
Both the default bootstrap configuration sqoop_bootstrap.properties
and the main configuration sqoop.properties
are located under the conf
directory in the Sqoop2 distribution directory.
The bootstrap configuration sqoop_bootstrap.properties
controls what the mechanism is to provide configuration:
sqoop.config.provider=org.apache.sqoop.core.PropertiesConfigurationProvider
The main configuration sqoop.properties
controls what the mechanism is for repository, where the log files are, what the logging levels are, etc.
# Log4J system org.apache.sqoop.log4j.appender.file=org.apache.log4j.RollingFileAppender org.apache.sqoop.log4j.appender.file.File=logs/sqoop.log org.apache.sqoop.log4j.appender.file.MaxFileSize=25MB org.apache.sqoop.log4j.appender.file.MaxBackupIndex=5 org.apache.sqoop.log4j.appender.file.layout=org.apache.log4j.PatternLayout org.apache.sqoop.log4j.appender.file.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} [%l] %m%n org.apache.sqoop.log4j.debug=true org.apache.sqoop.log4j.rootCategory=WARN, file org.apache.sqoop.log4j.category.org.apache.sqoop=DEBUG org.apache.sqoop.log4j.category.org.apache.derby=INFO # Repository org.apache.sqoop.repository.provider=org.apache.sqoop.repository.JdbcRepositoryProvider org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.derby.DerbyRepositoryHandler org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED org.apache.sqoop.repository.jdbc.maximum.connections=10 org.apache.sqoop.repository.jdbc.url=jdbc:derby:repository/db;create=true org.apache.sqoop.repository.jdbc.create.schema=true org.apache.sqoop.repository.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver org.apache.sqoop.repository.jdbc.user=sa org.apache.sqoop.repository.jdbc.password= org.apache.sqoop.repository.sysprop.derby.stream.error.file=logs/derbyrepo.log
Debugging information
The logs of the Tomcat server is located under the server/logs
directory in the Sqoop2 distribution directory.
The logs of the Sqoop2 server and the Derby repository are located as sqoop.log
and derbyrepo.log
(by default unless changed by the above configuration), respectively, under the
(LOGS)
directory in the Sqoop2 distribution directory.