...
Checkout sources and switch to sqoop2 branch:
No Format |
---|
$ git clone https://git-wip-us.apache.org/repos/asf/sqoop.git sqoop2 $ cd sqoop2 $ git checkout sqoop2 |
Setting up a build environment with Eclipse
...
Code Block |
---|
mvn clean integration-test -Dtest=org.apache.sqoop.integration.connector.jdbc.generic.FromRDBMSToHDFSTest -DfailIfNoTests=false |
Build sqoop :
No Format |
---|
$ mvn package
|
Optionally you can build Sqoop with skipping tests ( both unit tests and integration tests )
Code Block |
---|
$ mvn package -DskipTests |
Another Other handy command commands that does build and run all tests from scratch
Code Block |
---|
mvn verify
or
mvn clean install |
Creating Sqoop binaries
Now build and package Sqoop2 binary distribution:
No Format |
---|
$ mvn package -Pbinary or mvn package -Pbinary DskipTests=true -Dmaven.javadoc.skip=true -Pbinary -Dhadoop.profile=200 // for a specific hadoop profile |
This process will create a directory and a tarball under dist/target
directory. The directory (named sqoop-2.0.0-SNAPSHOT
or sqoop-2.0.0-SNAPSHOT-bin-hadoop200
, depending on the hadoop profile used ) contains necessary binaries to run Sqoop2, and its structure looks something like below.
Warning |
---|
VB: There is NO lib folder under the client in the latest code as of this writing |
No Format |
---|
--+ bin --+ sqoop.sh | + client --+ lib --+ sqoop-common.jar | | | + sqoop-client.jar | | | + (3rd-party client dependency jars) | + server --+ bin --+ setenv.sh | | | + conf --+ sqoop_bootstrap.properties | | | | | + sqoop.properties | | | + webapps --+ ROOT | | | + sqoop.war | + ... |
...
Sqoop server is depending on hadoop binaries, but they are not part of the distribution and thus you need to install them into Sqoop server manually. The latest hadoop version we support is 2.5.2 .
Warning |
---|
VB: There is no addtowar.sh in the in the latest code under sqoop-2.0.0-SNAPSHOT/bin as of this writing |
To install hadoop libraries execute command addtowar.sh
with argument -hadoop $version $location
. Following example is for Cloudera distribution version 4(CDH4):
...
Code Block |
---|
cd dist/target/sqoop-2.0.0-SNAPSHOT-bin-hadoop200 or cd dist/target/sqoop-2.0.0-SNAPSHOT
./bin/addtowar.sh -hadoop-version cdh4mr1 -hadoop-path /usr/lib
|
...
The main configuration sqoop.properties
controls what the mechanism is for repository, where the
- Where are the log files are, what the logging levels are
...
- ?
- what is the repository used
...
- ?
- what is the execution engine
...
- used?
No Format |
---|
# Log4J system org.apache.sqoop.log4j.appender.file=org.apache.log4j.RollingFileAppender org.apache.sqoop.log4j.appender.file.File=logs/sqoop.log org.apache.sqoop.log4j.appender.file.MaxFileSize=25MB org.apache.sqoop.log4j.appender.file.MaxBackupIndex=5 org.apache.sqoop.log4j.appender.file.layout=org.apache.log4j.PatternLayout org.apache.sqoop.log4j.appender.file.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} [%l] %m%n org.apache.sqoop.log4j.debug=true org.apache.sqoop.log4j.rootCategory=WARN, file org.apache.sqoop.log4j.category.org.apache.sqoop=DEBUG org.apache.sqoop.log4j.category.org.apache.derby=INFO # Repository org.apache.sqoop.repository.provider=org.apache.sqoop.repository.JdbcRepositoryProvider org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.derby.DerbyRepositoryHandler org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED org.apache.sqoop.repository.jdbc.maximum.connections=10 org.apache.sqoop.repository.jdbc.url=jdbc:derby:repository/db;create=true org.apache.sqoop.repository.jdbc.create.schema=true org.apache.sqoop.repository.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver org.apache.sqoop.repository.jdbc.user=sa org.apache.sqoop.repository.jdbc.password= org.apache.sqoop.repository.sysprop.derby.stream.error.file=logs/derbyrepo.log |
...
Debug Logs information
- The logs of the Tomcat server is located under the
server/logs
directory in the Sqoop2 distribution directory. - The logs of the Sqoop2 server
...
- as
sqoop.log
(by default unless changed by the above sqoop.properties configuration file ) under the(LOGS)
directory in the Sqoop2 distribution directory. - The logs for the Derby repository is derbyrepo.log (by default unless changed by the above
...
- sqoop.properties configuration file ) under the
(LOGS)
directory in the Sqoop2 distribution directory.