Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

How does the integration test suite work? 

Minicluster ( psuedo distributed mode)

  • They use the Hadoop Minicluster** behind the scenes. to simulate the MR execution engine environment. 
  •  Read more about Minicluster here 
  • http://gdfm.me/2010/08/03/how-to-run-a-minicluster-based-junit-test-with-eclipse/
  • The integration tests are tightly tied to the MR Execution engine at this point. Some rework will be needed to get this working in a Spark execution engine context.
    They use the Hadoop Minicluster** behind the scenes. to simulate the MR execution engine environment. It runs the test to simulate a real distributed cluster but only difference is that it happens in the same JVM. Hence it is also referred to as the psuedo distributed mode

LocalMode ( localRunner mode )

  • When using this option -Dsqoop.hadoop.runner.class=org.apache.sqoop.test.hadoop.HadoopLocalRunner, it it does not use the minicluster and 

...

In our code, this is how we detect that it is using localRunner

Code Block
/**
   * Detect MapReduce local mode.
   *
   * @return True if we're running in local mode
   */
  private boolean isLocal() {
    // If framework is set to YARN, then we can't be running in local mode
    if("yarn".equals(globalConfiguration.get("mapreduce.framework.name"))) {
      return false;
    }
    // If job tracker address is "local" then we're running in local mode
    return "local".equals(globalConfiguration.get("mapreduce.jobtracker.address"))
        || "local".equals(globalConfiguration.get("mapred.job.tracker"));
  }

 

 www.lopakalogic.com/articles/hadoop-articles/hadoop-testing-with-minicluster/

What DB does integration tests use today for storing the Sqoop entities ?

By default it is embedded Derby 

 

Code Block
public class DerbyProvider extends DatabaseProvider {
  @Override
  public void start() {
    // Start embedded server
    try {
      port = NetworkUtils.findAvailablePort();
      LOG.info("Will bind to port " + port);
      server = new NetworkServerControl(InetAddress.getByName("localhost"), port);
      server.start(new LoggerWriter(LOG, Level.INFO));
      // Start won't thrown an exception in case that it fails to start, one
      // have to explicitly call ping() in order to verify if the server is
      // up. Check DERBY-1465 for more details.
      server.ping();
    } catch (Exception e) {
      LOG.error("Can't start Derby network server", e);
      throw new RuntimeException("Can't derby server", e);
    }
    super.start();
  }

NOTE: Even though there are other providers such as  MySQLProvider and PostgreSQLProvider, they are not used in any of the tests.

What are the datasets we use in some of the integration tests ?

Anything that extends the following base class

Code Block
public abstract class DataSet { ..}

 

...