Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Note

Similar steps need to be followed with IDEA Intellij as well.

Setting up the Code Formatter 

See Sqoop 2 Coding Guidelines

Quick commands to compile and run tests

...

Code Block
mvn clean integration-test -Dtest=org.apache.sqoop.integration.connector.jdbc.generic.FromRDBMSToHDFSTest -DfailIfNoTests=false

 

Build sqoop :

No Format
mvn package

 

Optionally you can build Sqoop with skipping tests ( both unit tests and integration tests )

Code Block
mvn package -DskipTests

 

Other handy commands that does build and run all tests from scratch

Code Block
mvn verify
or
mvn clean install

Creating Sqoop binaries

Now build and package Sqoop2 binary distribution:

No Format
mvn package -Pbinary

 
or 
 
mvn package -DskipTests=true -Dmaven.javadoc.skip=true -Pbinary -Dhadoop.profile=200  // for a specific hadoop profile

This process will create a directory and a tarball under dist/target directory. The directory (named sqoop-2.0.0-SNAPSHOT or  sqoop-2.0.0-SNAPSHOT-bin-hadoop200, depending on the hadoop profile used ) contains necessary binaries to run Sqoop2, and its structure looks something like below.

Warning

VB: There is NO lib folder under the client in the latest code as of this writing

If you want to run tests against the postgres repository, have a working installation of postgres and then point to it when running tests. In the following case we have a working postgres installation as 

postgresql://postgresql.ent.cloudera.com/sqoop_test

Code Block
mvn clean integration-test -pl repository/repository-postgresql -Dsqoop.provider.class=org.apache.sqoop.common.test.db.PostgreSQLProvider -Dsqoop.provider.postgresql.jdbc=jdbc:postgresql://postgresql.ent.cloudera.com/sqoop_test -Dsqoop.provider.postgresql.username=sqoop -Dsqoop.provider.postgresql.password=sqoop -Dpostgresql

Sadly, as of this writing it does not really run the integration tests, it runs only the unit tests.

 

Build sqoop :

No Format
mvn package

 

Optionally you can build Sqoop with skipping tests ( both unit tests and integration tests )

Code Block
mvn package -DskipTests

 

Other handy commands that does build and run all tests from scratch

Code Block
mvn verify
or
mvn clean install

Creating Sqoop binaries

Now build and package Sqoop2 binary distribution:

No Format
mvn package -Pbinary

 
or 
 
mvn package -DskipTests=true -Dmaven.javadoc.skip=true -Pbinary -Dhadoop.profile=200  // for a specific hadoop profile

This process will create a directory and a tarball under dist/target directory. The directory (named sqoop-2.0.0-SNAPSHOT or  sqoop-2.0.0-SNAPSHOT-bin-hadoop200, depending on the hadoop profile used ) contains necessary binaries to run Sqoop2, and its structure looks something like below.

Warning

VB: There is NO lib folder under the client in the latest code as of this writing

No Format
--+ bin --+ sqoop.sh
  |
  + client --+ lib --+ sqoop-common.jar
No Format
--+ bin --+ sqoop.sh
  |
  + client --+ lib --+ sqoop-common.jar
  |                  |
  |                  + sqoop-client.jar
  |                  |
  |                  + (3rd-party client dependency jars)
  |
  + server --+ bin --+ setenv.sh
  |          |
  |          + conf --+ sqoop_bootstrap.properties
  |          |        |
  |          |        + sqoop-client.propertiesjar
  |          |
  |      |
  |                + webapps --+ ROOT
  |            (3rd-party client dependency jars)
  |
  + server --+ bin --+ setenv.sh
  |          |
  |          + conf --+ sqoop_bootstrap.properties
  |          |        |
  |          |        + sqoop.properties
  |          |
  |          + webapps --+ ROOT
  |                      |
  |                      + sqoop.war
  |
  + ...

As part of this process, a copy of the Tomcat server is also downloaded and put under the server directory in the above structure.

Note

If you are on particular release branch such as 1.99.4, all the artifacts in it will be created with the 1.99.4 build version. for instance sqoop-1.99.4-bin-hadoop200.tar.gz

Installing Sqoop2 on remote server

To install generated binaries on remote server simply copy directory sqoop-2.0.0-SNAPSHOT to your remote server:

Code Block
scp             + sqoop.war
  |
  + ...

As part of this process, a copy of the Tomcat server is also downloaded and put under the server directory in the above structure.

Note

If you are on particular release branch such as 1.99.4, all the artifacts in it will be created with the 1.99.4 build version. for instance sqoop-1.99.4-bin-hadoop200.tar.gz

Installing Sqoop2 on remote server

...

-r dist/target/sqoop-2.0.0-SNAPSHOT remote-server.company.org:/remote/path/

Install dependencies

Sqoop server is depending on hadoop binaries, but they are not part of the distribution and thus you need to install them into Sqoop server manually. The latest hadoop version we support is 2.5.2 .

Warning

VB: There is no addtowar.sh in the in the latest code under sqoop-2.0.0-SNAPSHOT/bin as of this writing

 

To install hadoop libraries execute command addtowar.sh with argument -hadoop $version $location. Following example is for Cloudera distribution version 4(CDH4):

Code Block
 ./bin/addtowar.sh -hadoop 2.0 /usr/lib/hadoop/client/

If you're running CDH4 MR1:

Code Block
cd dist/target/sqoop-2.0.0-SNAPSHOT

...

Code Block
scp -r-bin-hadoop200 or cd dist/target/sqoop-2.0.0-SNAPSHOT remote-server.company.org:/remote/path/

Install dependencies

Sqoop server is depending on hadoop binaries, but they are not part of the distribution and thus you need to install them into Sqoop server manually. The latest hadoop version we support is 2.5.2 .

-2.0.0-SNAPSHOT
./bin/addtowar.sh -hadoop-version cdh4mr1 -hadoop-path /usr/lib

In case that you're running original Mapreduce implementation (MR1), you will also need to install it's jar:

Code Block
 ./bin/addtowar.sh -jars /usr/lib/hadoop-0.20-mapreduce/hadoop
Warning
VB: There is no addtowar.sh in the in the latest code under sqoop
-2.0.0
-SNAPSHOT/bin as of this writing

 

To install hadoop libraries execute command addtowar.sh with argument -hadoop $version $location. Following example is for Cloudera distribution version 4(CDH4):

-mr1-cdh4.1.1-core.jar

You can install any arbitrary jars (connectors, JDBC drivers) using -jars argument that takes list of jars separated by ":". Here is example for installing MySQL jdbc driver into Sqoop server:

Code Block
 
Code Block
 ./bin/addtowar.sh -hadoopjars 2.0 /usrpath/libto/hadoop/client/

If you're running CDH4 MR1:

Code Block
cd dist/target/sqoop-2.0.0-SNAPSHOT-bin-hadoop200 or cd dist/target/sqoop-2.0.0-SNAPSHOT
./bin/addtowar.sh -hadoop-version cdh4mr1 -hadoop-path /usr/lib

In case that you're running original Mapreduce implementation (MR1), you will also need to install it's jar:

Code Block
 ./bin/addtowar.sh -jars /usr/lib/hadoop-0.20-mapreduce/hadoop-2.0.0-mr1-cdh4.1.1-core.jar

You can install any arbitrary jars (connectors, JDBC drivers) using -jars argument that takes list of jars separated by ":". Here is example for installing MySQL jdbc driver into Sqoop server:

Code Block
  ./bin/addtowar.sh -jars /path/to/jar/mysql-connector-java-5.1.21-bin.jar

Installing a new connector to Sqoop2

Code Block
// todo : VB
jar/mysql-connector-java-5.1.21-bin.jar

Installing a new connector to Sqoop2

If you are contributing or adding a new connector say sqoop-foo-connector to the sqoop2, here are steps to follow.

 

Step 1: Create a sqoop-foo-connector.jar. Make sure the jar contains the sqoopconnector.properties for it to be picked up by sqoop

A typical sqoopconnector.properties for a sqoop2 connector looks like below

Code Block
# Generic JDBC Connector Properties
org.apache.sqoop.connector.class = org.apache.sqoop.connector.foo.FooConnector
org.apache.sqoop.connector.name = sqoop-foo-connector

 

Step 2: Add this jar to the a folder on your installation machine and update the path to this folder in the sqoop.properties located under the server/conf directory under the Sqoop2  for the key 

org.apache.sqoop.connector.external.loadpath

Code Block
#
# External connectors load path
# "/path/to/external/connectors/": Add all the connector JARs in the specified folder
#
org.apache.sqoop.connector.external.loadpath=/path/to/connector

 

Step 3: Start the server and while initalizing the server this jar should be loaded into the sqoop's class path and registered into the sqoop repository/ 

Starting/Stopping Sqoop2 server

...