...
Checkout sources and switch to sqoop2 branch:
No Format |
---|
$ git clone https://git-wip-us.apache.org/repos/asf/sqoop.git sqoop2 $ cd sqoop2 $ git checkout sqoop2 |
Setting up a build environment with Eclipse
...
- Import the project into eclipse by going to File > Import... > General > Existing Projects into Workspace > Next.
- In the next wizard window, click the browse button next to "Select root directory" and browse to the root of the workspace from where you have checked out sqoop2. This will populate about 10 projects into your workspace - all of which are different modules within Sqoop 2Sqoop2. Click Finish button to get these projects into the workspace and start working.
...
Note |
---|
Similar steps need to be followed with IDEA Intellij as well. |
Setting up the Code Formatter
Quick commands to compile and run tests
...
Code Block |
---|
mvn clean integration-test -Dtest=org.apache.sqoop.integration.connector.jdbc.generic.FromRDBMSToHDFSTest -DfailIfNoTests=false |
If you want to run tests against the postgres repository, have a working installation of postgres and then point to it when running tests. In the following case we have a working postgres installation as
postgresql://postgresql.ent.cloudera.com/sqoop_test
Code Block |
---|
mvn clean integration-test -pl repository/repository-postgresql -Dsqoop.provider.class=org.apache.sqoop.common.test.db.PostgreSQLProvider -Dsqoop.provider.postgresql.jdbc=jdbc:postgresql://postgresql.ent.cloudera.com/sqoop_test -Dsqoop.provider.postgresql.username=sqoop -Dsqoop.provider.postgresql.password=sqoop -Dpostgresql |
Sadly, as of this writing it does not really run the integration tests, it runs only the unit tests.
Build sqoop :
No Format |
---|
$ mvn package
|
Optionally you can build Sqoop with skipping tests ( both unit tests and integration tests )
Code Block |
---|
$ mvn package -DskipTests |
Another Other handy command commands that does build and run all tests from scratch
Code Block |
---|
mvn verify
or
mvn clean install |
Creating Sqoop binaries
Now build and package Sqoop2 binary distribution:
No Format |
---|
$ mvn package -Pbinary or mvn package -DskipTests=true -Dmaven.javadoc.skip=true -Pbinary -Dhadoop.profile=200 // for a specific hadoop profile |
This process will create a directory and a tarball under dist/target
directory. The directory (named sqoop-2.0.0-SNAPSHOT
or sqoop-2.0.0-SNAPSHOT-bin-hadoop200
, depending on the hadoop profile used ) contains necessary binaries to run Sqoop2, and its structure looks something like below.
Warning |
---|
VB: There is NO lib folder under the client in the latest code as of this writing |
No Format |
---|
--+ bin --+ sqoop.sh | + client --+ lib --+ sqoop-common.jar | | | + sqoop-client.jar | | | + (3rd-party client dependency jars) | + server --+ bin --+ setenv.sh | | | + conf --+ sqoop_bootstrap.properties | | | | | + sqoop.properties | | | + webapps --+ ROOT | | | + sqoop.war | + ... |
...
Sqoop server is depending on hadoop binaries, but they are not part of the distribution and thus you need to install them into Sqoop server manually. The latest hadoop version we support is 2.5.2 .
Warning |
---|
VB: There is no addtowar.sh in the in the latest code under sqoop-2.0.0-SNAPSHOT/bin as of this writing |
To install hadoop libraries execute command addtowar.sh
with argument -hadoop $version $location
. Following example is for Cloudera distribution version 4(CDH4):
...
Code Block |
---|
cd dist/target/sqoop-2.0.0-SNAPSHOT-bin-hadoop200 or cd dist/target/sqoop-2.0.0-SNAPSHOT
./bin/addtowar.sh -hadoop-version cdh4mr1 -hadoop-path /usr/lib
|
...
Code Block |
---|
./bin/addtowar.sh -jars /path/to/jar/mysql-connector-java-5.1.21-bin.jar |
Installing a new connector to Sqoop2
If you are contributing or adding a new connector
...
say sqoop-foo-connector
to the sqoop2, here are steps to follow.
Step 1: Create a sqoop-foo-connector.jar
. Make sure the jar contains the sqoopconnector.properties
for it to be picked up by sqoop
A typical sqoopconnector.properties for a sqoop2 connector looks like below
Code Block |
---|
# Generic JDBC Connector Properties
org.apache.sqoop.connector.class = org.apache.sqoop.connector.foo.FooConnector
org.apache.sqoop.connector.name = sqoop-foo-connector |
Step 2: Add this jar to the a folder on your installation machine and update the path to this folder in the sqoop.properties
located under the server/conf
directory under the Sqoop2 for the key
org.apache.sqoop.connector.external.loadpath
Code Block |
---|
#
# External connectors load path
# "/path/to/external/connectors/": Add all the connector JARs in the specified folder
#
org.apache.sqoop.connector.external.loadpath=/path/to/connector
|
Step 3: Start the server and while initalizing the server this jar should be loaded into the sqoop's class path and registered into the sqoop repository/
Code Block |
---|
// todo : VB |
Starting/Stopping Sqoop2 server
...
Note |
---|
Please see the 5 min Demo Guide or the Command Line Shell Guide for the latest release 1.99.* http://sqoop.apache.org/docs/ |
...
Sqoop configuration files
Both the default bootstrap configuration sqoop_bootstrap.properties
and the main configuration sqoop.properties
are located under the server/conf
directory in the Sqoop2 distribution directory.
...
The main configuration sqoop.properties
controls what the mechanism is for repository, where the
- Where are the log files are, what the logging levels are
...
- ?
- What is the repository used
...
- ?
- What is the submission/ execution engine
...
- used?
- What is the Authentication mechanism used?
...
No Format |
---|
# Log4J systemLogging Configuration # Any property that starts with the prefix # org.apache.sqoop.log4j is parsed out by the configuration # system and passed to the log4j subsystem. This allows you # to specify log4j configuration properties from within the # Sqoop configuration. # org.apache.sqoop.log4j.appender.file=org.apache.log4j.RollingFileAppender org.apache.sqoop.log4j.appender.file.File=logs@LOGDIR@/sqoop.log org.apache.sqoop.log4j.appender.file.MaxFileSize=25MB org.apache.sqoop.log4j.appender.file.MaxBackupIndex=5 org.apache.sqoop.log4j.appender.file.layout=org.apache.log4j.PatternLayout org.apache.sqoop.log4j.appender.file.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} [%l] %m%n org.apache.sqoop.log4j.debug=true org.apache.sqoop.log4j.rootCategory=WARN, file org.apache.sqoop.log4j.category.org.apache.sqoop=DEBUG org.apache.sqoop.log4j.category.org.apache.derby=INFO # # Audit Loggers Configuration # Multiple audit loggers could be given here. To specify an # audit logger, you should at least add org.apache.sqoop. # auditlogger.[LoggerName].class. You could also provide # more configuration options by using org.apache.sqoop. # auditlogger.[LoggerName] prefix, then all these options # are parsed to the logger class. # org.apache.sqoop.auditlogger.default.class=org.apache.sqoop.audit.FileAuditLogger org.apache.sqoop.auditlogger.default.file=@LOGDIR@/default.audit # # Repository configuration # The Repository subsystem provides the special prefix which # is "org.apache.sqoop.repository.sysprop". Any property that # is specified with this prefix is parsed out and set as a # system property. For example, if the built in Derby repository # is being used, the sysprop prefixed properties can be used # to affect Derby configuration at startup time by setting # the appropriate system properties. # # Repository provider org.apache.sqoop.repository.provider=org.apache.sqoop.repository.JdbcRepositoryProvider # Repository upgrade # If set to true, it will not upgrade the sqoop respository schema, by default it will iniate the upgrade on server start-up org.apache.sqoop.repository.schema.immutable=false # JDBC repository provider configuration org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.derby.DerbyRepositoryHandler org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED org.apache.sqoop.repository.jdbc.maximum.connections=10 org.apache.sqoop.repository.jdbc.url=jdbc:derby:@BASEDIR@/repository/db;create=true org.apache.sqoop.repository.jdbc.create.schema=true org.apache.sqoop.repository.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver org.apache.sqoop.repository.jdbc.user=sa org.apache.sqoop.repository.jdbc.password= # System properties for embedded Derby configuration org.apache.sqoop.repository.sysprop.derby.stream.error.file=logs@LOGDIR@/derbyrepo.log # # Sqoop Connector configuration # If set to true will initiate Connectors config upgrade during server startup # org.apache.sqoop.connector.autoupgrade=false # # Sqoop Driver configuration # If set to true will initiate the Driver config upgrade during server startup # org.apache.sqoop.driver.autoupgrade=false # Sleeping period for reloading configuration file (once a minute) org.apache.sqoop.core.configuration.provider.properties.sleep=60000 # # Submission engine configuration # # Submission engine class org.apache.sqoop.submission.engine=org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine # Number of milliseconds, submissions created before this limit will be removed, default is one day #org.apache.sqoop.submission.purge.threshold= # Number of milliseconds for purge thread to sleep, by default one day #org.apache.sqoop.submission.purge.sleep= # Number of milliseconds for update thread to sleep, by default 5 minutes #org.apache.sqoop.submission.update.sleep= # # Configuration for Mapreduce submission engine (applicable if it's configured) # # Hadoop configuration directory org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/etc/hadoop/conf/ # # Execution engine configuration # org.apache.sqoop.execution.engine=org.apache.sqoop.execution.mapreduce.MapreduceExecutionEngine # # Authentication configuration # #org.apache.sqoop.authentication.type=SIMPLE #org.apache.sqoop.authentication.handler=org.apache.sqoop.security.SimpleAuthenticationHandler #org.apache.sqoop.anonymous=true #org.apache.sqoop.authentication.type=KERBEROS #org.apache.sqoop.authentication.handler=org.apache.sqoop.security.KerberosAuthenticationHandler |
...
#org.apache.sqoop.authentication.kerberos.principal=sqoop/_HOST@NOVALOCAL
#org.apache.sqoop.authentication.kerberos.keytab=/home/kerberos/sqoop.keytab
#org.apache.sqoop.authentication.kerberos.http.principal=HTTP/_HOST@NOVALOCAL
#org.apache.sqoop.authentication.kerberos.http.keytab= |
Debug Logs information
- The logs of the Tomcat server is located under the
server/logs
directory in the Sqoop2 distribution directory, most relevant would becatalina.out
- The logs of the Sqoop2 server
...
- as
sqoop.log
(by default unless changed by the above sqoop.properties configuration file ) under the(@LOGDIR)
directory in the Sqoop2 distribution directory. - The logs for the Derby repository is
derbyrepo.log
(by default unless changed by the above
...
- sqoop.properties configuration file ) under the
(@LOGDIR)
directory in the Sqoop2 distribution directory.