WORK IN PROGRESS
Some of the well known quirks for the Sqoop2 integration test suite is documented here so that developers can be aware of what to expect when running
How To run integration tests?
We recommend not running integration tests from your IDE - there can be some strange and unexpected errors there.
Use command line to run the tests:
mvn clean integration-test
to run a specific test:
mvn -Dtest=org.apache.sqoop.integration.connector.kafka.FromRDBMSToKafkaTest -DfailIfNoTests=false verify
to run with local mapreduce (faster and theoretically you should be able to attach a debugger)
But there may be some quirks with HadoopLocalRunner and is not always recommended
mvn -Dsqoop.hadoop.runner.class=org.apache.sqoop.test.hadoop.HadoopLocalRunner -Dtest=org.apache.sqoop.integration.connector.kafka.FromRDBMSToKafkaTest -DfailIfNoTests=false verify
How does the integration test suite work?
They use the Hadoop Minicluster behind the scenes. to simulate the MR execution engine environment
What DB does integration tests use today for storing the Sqoop entities ?
public class DerbyProvider extends DatabaseProvider { @Override public void start() { // Start embedded server try { port = NetworkUtils.findAvailablePort(); LOG.info("Will bind to port " + port); server = new NetworkServerControl(InetAddress.getByName("localhost"), port); server.start(new LoggerWriter(LOG, Level.INFO)); // Start won't thrown an exception in case that it fails to start, one // have to explicitly call ping() in order to verify if the server is // up. Check DERBY-1465 for more details. server.ping(); } catch (Exception e) { LOG.error("Can't start Derby network server", e); throw new RuntimeException("Can't derby server", e); } super.start(); }
NOTE: Even though there are other providers such as MySQLProvider and PostgreSQLProvider, they are not used in any of the tests.
What are the datasets we use in some of the integration tests ?
Anything that extends the following base class
public abstract class DataSet { ..}
Where to look for MR Job related logs in the integration tests?
Look under /test/target
under your source folder. Inside each of the MiniMRCluster_XXXX folders there will sub folders and logs.
/path/to/sqoop2/test/target/MiniMRCluster_96106422 MiniMRCluster_96106422-localDir-nm-0_0 MiniMRCluster_96106422-localDir-nm-0_2 MiniMRCluster_96106422-logDir-nm-0_0 MiniMRCluster_96106422-logDir-nm-0_2 MiniMRCluster_96106422-localDir-nm-0_1 MiniMRCluster_96106422-localDir-nm-0_3 MiniMRCluster_96106422-logDir-nm-0_1 MiniMRCluster_96106422-logDir-nm-0_3
What happens when integration tests are abruptly terminated due to CTRL + C or failures?
ps -ef | grep java killall -9 java or more advanced.... for p in `ps aux | grep java | grep YarnChild| sed -re "s/<username> ([0-9]+) ./\1/"`; do echo $p; kill -9 $p; done
Some related tickets that is in place to fix some of these quirks
- SQOOP-1840 - DerbyProvider quirks,
- SQOOP-1844 -
- SQOOP-1832
- SQOOP-1831 - MR file names now logged