You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 38 Next »

Note

Please see the docs for latest release in 1.99.* http://sqoop.apache.org/docs/ . Some of the information below might be outdated

Building from sources

Checkout sources and switch to sqoop2 branch:

$ git clone https://git-wip-us.apache.org/repos/asf/sqoop.git sqoop2
$ cd sqoop2
$ git checkout sqoop2

 

Setting up a build environment with Eclipse

Installation

  • Install Eclipse,
  • Install maven if not already on your machine
  • Install Oracle's JDK 

Set up 

  • Run the following commands
mvn eclipse:configure-workspace -Declipse.workspace=<path to your eclipse workspace>
&&
mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
  • Import the project into eclipse by going to File > Import... > General > Existing Projects into Workspace > Next.
  • In the next wizard window, click the browse button next to "Select root directory" and browse to the root of the workspace from where you have checked out sqoop2. This will populate about 10 projects into your workspace - all of which are different modules within Sqoop 2. Click Finish button to get these projects into the workspace and start working.

Note - if this is the first time you are setting up Eclipse for a maven project, the import will show class path problems due to missing variable M2_REPO (Unbound classpath variable: 'M2_REPO/...). To fix this error, go to Preferences > Java > Build Path > Classpath Variables. Click on New..., enter name M2_REPO, click on Folder and browse upto the directory ~/.m2/repository. Click OK and close the preferences dialog. This will force the rebuild of the workspace and all projects should turn green.

Similar steps need to be followed with IDEA Intellij as well.

Quick commands to compile and run tests

Sqoop clean:

mvn clean

Sqoop compile:

mvn compile

Run all unit tests:

mvn test

Run all  integration tests :

Running integration tests does take up a lot of CPU, since these tests run on the actual execution engine ( such as Hadoop MR ) esp.

Running org.apache.sqoop.integration.connector.jdbc.generic.PartitionerTest

mvn clean integration-test

Run one integration test:

mvn clean integration-test -Dtest=org.apache.sqoop.integration.connector.jdbc.generic.FromRDBMSToHDFSTest -DfailIfNoTests=false

 

Build sqoop :

$ mvn package

 

Optionally you can build Sqoop with skipping tests ( both unit tests and integration tests )

$ mvn package -DskipTests

 

Another handy command that does build and run all tests

mvn verify

Creating Sqoop binaries

Now build and package Sqoop2 binary distribution:

$ mvn package -Pbinary

This process will create a directory and a tarball under dist/target directory. The directory (named sqoop-2.0.0-SNAPSHOT or  sqoop-2.0.0-SNAPSHOT-bin-hadoop200, depending on the hadoop profile used ) contains necessary binaries to run Sqoop2, and its structure looks something like below.

VB: There is NO lib folder under the client in the latest code as of this writing

--+ bin --+ sqoop.sh
  |
  + client --+ lib --+ sqoop-common.jar
  |                  |
  |                  + sqoop-client.jar
  |                  |
  |                  + (3rd-party client dependency jars)
  |
  + server --+ bin --+ setenv.sh
  |          |
  |          + conf --+ sqoop_bootstrap.properties
  |          |        |
  |          |        + sqoop.properties
  |          |
  |          + webapps --+ ROOT
  |                      |
  |                      + sqoop.war
  |
  + ...

As part of this process, a copy of the Tomcat server is also downloaded and put under the server directory in the above structure.

If you are on particular release branch such as 1.99.4, all the artifacts in it will be created with the 1.99.4 build version. for instance sqoop-1.99.4-bin-hadoop200.tar.gz

Installing Sqoop2 on remote server

To install generated binaries on remote server simply copy directory sqoop-2.0.0-SNAPSHOT to your remote server:

scp -r dist/target/sqoop-2.0.0-SNAPSHOT remote-server.company.org:/remote/path/

Install dependencies

Sqoop server is depending on hadoop binaries, but they are not part of the distribution and thus you need to install them into Sqoop server manually. The latest hadoop version we support is 2.5.2 .

VB: There is no addtowar.sh in the in the latest code under sqoop-2.0.0-SNAPSHOT/bin as of this writing

 

To install hadoop libraries execute command addtowar.sh with argument -hadoop $version $location. Following example is for Cloudera distribution version 4(CDH4):

 ./bin/addtowar.sh -hadoop 2.0 /usr/lib/hadoop/client/

If you're running CDH4 MR1:

cd dist/target/sqoop-2.0.0-SNAPSHOT-bin-hadoop200
./bin/addtowar.sh -hadoop-version cdh4mr1 -hadoop-path /usr/lib

In case that you're running original Mapreduce implementation (MR1), you will also need to install it's jar:

 ./bin/addtowar.sh -jars /usr/lib/hadoop-0.20-mapreduce/hadoop-2.0.0-mr1-cdh4.1.1-core.jar

You can install any arbitrary jars (connectors, JDBC drivers) using -jars argument that takes list of jars separated by ":". Here is example for installing MySQL jdbc driver into Sqoop server:

  ./bin/addtowar.sh -jars /path/to/jar/mysql-connector-java-5.1.21-bin.jar

Installing a new connector to Sqoop 2

// todo : VB

 

Starting/Stopping Sqoop2 server

To start Sqoop2 server invoke the sqoop shell script:

cd dist/target/sqoop-2.0.0-SNAPSHOT
bin/sqoop.sh server start

The Sqoop2 server is then running as a web application within the Tomcat server.

Similarly, to stop Sqoop2 server, do the following:

bin/sqoop.sh server stop

Starting/Running Sqoop2 client

To start an interactive shell,

bin/sqoop.sh client

This will bring up an interactive client ready for input commands:

Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000>

Please see the 5 min Demo Guide or the Command Line Shell Guide for the latest release 1.99.* http://sqoop.apache.org/docs/

Modifying configuration

Both the default bootstrap configuration sqoop_bootstrap.properties and the main configuration sqoop.properties are located under the server/conf directory in the Sqoop2 distribution directory.

The bootstrap configuration sqoop_bootstrap.properties controls what the mechanism is to provide configuration for different managers in the Sqoop.

sqoop.config.provider=org.apache.sqoop.core.PropertiesConfigurationProvider

The main configuration sqoop.properties controls what the mechanism is for repository, where the log files are, what the logging levels are, what is the repository used, what is the execution engine...etc

 

# Log4J system
org.apache.sqoop.log4j.appender.file=org.apache.log4j.RollingFileAppender
org.apache.sqoop.log4j.appender.file.File=logs/sqoop.log
org.apache.sqoop.log4j.appender.file.MaxFileSize=25MB
org.apache.sqoop.log4j.appender.file.MaxBackupIndex=5
org.apache.sqoop.log4j.appender.file.layout=org.apache.log4j.PatternLayout
org.apache.sqoop.log4j.appender.file.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} [%l] %m%n
org.apache.sqoop.log4j.debug=true
org.apache.sqoop.log4j.rootCategory=WARN, file
org.apache.sqoop.log4j.category.org.apache.sqoop=DEBUG
org.apache.sqoop.log4j.category.org.apache.derby=INFO

# Repository
org.apache.sqoop.repository.provider=org.apache.sqoop.repository.JdbcRepositoryProvider
org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.derby.DerbyRepositoryHandler
org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
org.apache.sqoop.repository.jdbc.maximum.connections=10
org.apache.sqoop.repository.jdbc.url=jdbc:derby:repository/db;create=true
org.apache.sqoop.repository.jdbc.create.schema=true
org.apache.sqoop.repository.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver
org.apache.sqoop.repository.jdbc.user=sa
org.apache.sqoop.repository.jdbc.password=
org.apache.sqoop.repository.sysprop.derby.stream.error.file=logs/derbyrepo.log

Debugging information

The logs of the Tomcat server is located under the server/logs directory in the Sqoop2 distribution directory.

The logs of the Sqoop2 server and the Derby repository are located as sqoop.log and derbyrepo.log (by default unless changed by the above configuration), respectively, under the  (LOGS) directory in the Sqoop2 distribution directory.

  • No labels