Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Checkout sources and switch to sqoop2 branch:

No Format
$ git clone https://git-wip-us.apache.org/repos/asf/sqoop.git sqoop2
$ cd sqoop2
$ git checkout sqoop2

 

Setting up a build environment with Eclipse

...

Code Block
mvn clean integration-test -Dtest=org.apache.sqoop.integration.connector.jdbc.generic.FromRDBMSToHDFSTest -DfailIfNoTests=false

 

Build sqoop :

No Format
$ mvn package

 

Optionally you can build Sqoop with skipping tests ( both unit tests and integration tests )

Code Block
$ mvn package -DskipTests

 

Another Other handy command commands that does build and run all tests from scratch

Code Block
mvn verify
or
mvn clean install

Creating Sqoop binaries

Now build and package Sqoop2 binary distribution:

No Format
$ mvn package -Pbinary

 
or 
 
mvn package -Pbinary
DskipTests=true -Dmaven.javadoc.skip=true -Pbinary -Dhadoop.profile=200  // for a specific hadoop profile

This process will create a directory and a tarball under dist/target directory. The directory (named sqoop-2.0.0-SNAPSHOT or  sqoop-2.0.0-SNAPSHOT-bin-hadoop200, depending on the hadoop profile used ) contains necessary binaries to run Sqoop2, and its structure looks something like below.

Warning

VB: There is NO lib folder under the client in the latest code as of this writing

No Format
--+ bin --+ sqoop.sh
  |
  + client --+ lib --+ sqoop-common.jar
  |                  |
  |                  + sqoop-client.jar
  |                  |
  |                  + (3rd-party client dependency jars)
  |
  + server --+ bin --+ setenv.sh
  |          |
  |          + conf --+ sqoop_bootstrap.properties
  |          |        |
  |          |        + sqoop.properties
  |          |
  |          + webapps --+ ROOT
  |                      |
  |                      + sqoop.war
  |
  + ...

...

Sqoop server is depending on hadoop binaries, but they are not part of the distribution and thus you need to install them into Sqoop server manually. The latest hadoop version we support is 2.5.2 .

Warning

VB: There is no addtowar.sh in the in the latest code under sqoop-2.0.0-SNAPSHOT/bin as of this writing

 

To install hadoop libraries execute command addtowar.sh with argument -hadoop $version $location. Following example is for Cloudera distribution version 4(CDH4):

...

Code Block
cd dist/target/sqoop-2.0.0-SNAPSHOT-bin-hadoop200 or cd dist/target/sqoop-2.0.0-SNAPSHOT
./bin/addtowar.sh -hadoop-version cdh4mr1 -hadoop-path /usr/lib

...

The main configuration sqoop.properties controls what the mechanism is for repository, where the

  • Where are the log files are, what the logging levels are

...

  • ?
  • what is the repository used

...

  • ?
  • what is the execution engine

...

  • used?

 

No Format
# Log4J system
org.apache.sqoop.log4j.appender.file=org.apache.log4j.RollingFileAppender
org.apache.sqoop.log4j.appender.file.File=logs/sqoop.log
org.apache.sqoop.log4j.appender.file.MaxFileSize=25MB
org.apache.sqoop.log4j.appender.file.MaxBackupIndex=5
org.apache.sqoop.log4j.appender.file.layout=org.apache.log4j.PatternLayout
org.apache.sqoop.log4j.appender.file.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} [%l] %m%n
org.apache.sqoop.log4j.debug=true
org.apache.sqoop.log4j.rootCategory=WARN, file
org.apache.sqoop.log4j.category.org.apache.sqoop=DEBUG
org.apache.sqoop.log4j.category.org.apache.derby=INFO

# Repository
org.apache.sqoop.repository.provider=org.apache.sqoop.repository.JdbcRepositoryProvider
org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.derby.DerbyRepositoryHandler
org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
org.apache.sqoop.repository.jdbc.maximum.connections=10
org.apache.sqoop.repository.jdbc.url=jdbc:derby:repository/db;create=true
org.apache.sqoop.repository.jdbc.create.schema=true
org.apache.sqoop.repository.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver
org.apache.sqoop.repository.jdbc.user=sa
org.apache.sqoop.repository.jdbc.password=
org.apache.sqoop.repository.sysprop.derby.stream.error.file=logs/derbyrepo.log

 

...

Debug Logs information

  • The logs of the Tomcat server is located under the server/logs directory in the Sqoop2 distribution directory.
  • The logs of the Sqoop2 server

...

  •  as sqoop.log  (by default unless changed by the above sqoop.properties configuration file ) under the  (LOGS) directory in the Sqoop2 distribution directory.
  • The logs for the Derby repository is  derbyrepo.log (by default unless changed by the above

...

  •  sqoop.properties  configuration file ) under the  (LOGS) directory in the Sqoop2 distribution directory.