Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

 

JIRA: SQOOP-1680

Table of Contents

This document serves as a guide for the public facing Sqoop Repository API as of 1.99.5 release

Warning

This API can evolve in future releases and hence it is relevant to the state of the API in 1.99.5

...

Summary

Sqoop2 supports a persistent store for the sqoop entities such as the Configurables ( Connector and Driver) , Configs exposed by the Connectors, Jobs and Jobruns JobRuns/Submissions etc. The persistent store is commonly referred to as the repository. We also expose Rest APIs and shell commands to perform CRUD operations on the sqoop entities such as connectors and drivers, connector configs related to link and job information, sqoop job and its configs. Thus the persistent store comes handy in keeping a history of the sqoop entity objects created and updated over time. In order to access the persistent store with ease, we also expose a simple java based repository API that different data stores can implement to store the sqoop entity objects.

...

The rest of the document will focus on the main public facing entities and  repository APIs 

 

Sqoop Entities

Represents the sqoop connector's link information. Link encapsulates the details required to connect to the the data source the connector represents. It has one associated config MLinkConfig

...

Top Level Entity

...

Represents a core entity that exposes config objects and used in sqoop job lifecycle.

Configurable have a associated version that acts as a identifier for connector config upgrades.

Code Block
titleMConfigurableType
collapsetrue
/**
 * Represents the sqoop entities that can own configs
 */
public enum MConfigurableType {
  /** Connector as a owner of config keys */
  CONNECTOR,
  /** Driver as a owner of config keys */
  DRIVER;
}

Refer to this wiki for details on the Sqoop Entities. Without understanding the sqoop entities it is not worth reading further.

Sqoop Repository API 

Details and Javadocs are available in Repository.java ( The trunk of sqoop2). Here are the high level details on the the APIs

Entity Related APIs


EntityAPIsNotes

CONNECTOR

public abstract MConnector registerConnector(MConnector mConnector, boolean autoUpgrade);

public abstract MConnector findConnector(String shortName);
public abstract List<MConnector> findConnectors();
 

READ ONLY APIs.

 

DRIVER

public abstract MDriver registerDriver(MDriver mDriverConfig, boolean autoUpgrade);
public abstract MDriver findDriver(String shortName);
READ ONLY APIs.

LINK

public abstract void createLink(MLink link);
public abstract void updateLink(MLink link);
public abstract void updateLink(final MLink link, RepositoryTransaction tx);
public abstract void enableLink(long id, boolean enabled);
public abstract void deleteLink(long id);
public abstract MLink findLink(long id);

public abstract MLink findLink(String name);
public abstract List<MLink> findLinksForConnector(long connectorId);
public abstract List<MLink> findLinks();
CRUD APIs

JOB

public abstract void createJob(MJob job);
public abstract void updateJob(MJob job);
public abstract void updateJob(MJob job, RepositoryTransaction tx);
public abstract void enableJob(long id, boolean enabled);
public abstract void deleteJob(long id);
public abstract MJob findJob(long id);
public abstract MJob findJob(String name);
public abstract List<MJob> findJobs();
public abstract List<MJob> findJobsForConnector(long connectorId);
CRUD APIs

CONFIG

None

 

INPUT ( VALUES)
public abstract void deleteJobInputs(long jobId, RepositoryTransaction tx);
public abstract void deleteLinkInputs(long linkId, RepositoryTransaction tx);

No Public API for users yet

See SQOP-1516 for more details - 1.99.5 changes got Input RU

 

Input deletion can happen as part of the connector/driver upgrade path

SUBMISSION
public abstract void createSubmission(MSubmission submission);
public abstract void updateSubmission(MSubmission submission);
public abstract void purgeSubmissions(Date threshold);
public abstract List<MSubmission> findUnfinishedSubmissions();
public abstract List<MSubmission> findSubmissions();
public abstract List<MSubmission> findSubmissionsForJob(long jobId);
public abstract MSubmission findLastSubmissionForJob(long jobId);

CUD for internal sqoop use only

READ ONLY APIs for user

Repository Upgrade related APIs

Code Block
  /**
   * Create or update the repository schema structures.
   *
   * This method will be called from the Sqoop server if enabled via a config
   * {@link RepoConfigurationConstants#SYSCFG_REPO_SCHEMA_IMMUTABLE} to enforce
   * changing the repository schema structure or explicitly via the
   * {@link UpgradeTool} Repository should not change its schema structure
   * outside of this method. This method must be no-op in case that the schema
   * structure do not need any upgrade.
   */
  public abstract void createOrUpgradeRepository();
  /**
   * Return true if internal repository structures exists and are suitable for use.
   * This method should return false in case that the structures do exists, but
   * are not suitable to use i.e corrupted as part of the upgrade
   *
   * @return Boolean values if internal structures are suitable for use
   */
  public abstract boolean isRepositorySuitableForUse();

 

Configurable Upgrade related APIs

( NOTE: The following apis could have been its own independent API, but it exists in the repository since the configurables config/input objects reside in the repository )

Connector Upgrade API

upgradeConnector has a default implementation provided in the Repository.java 
Code Block
  /**
   * Upgrade the connector with the same {@linkplain MConnector#uniqueName}
   * in the repository with values from <code>newConnector</code>.
   * <p/>
   * All links and jobs associated with this connector will be upgraded
   * automatically.
   *
   * @param oldConnector The old connector that should be upgraded.
   * @param newConnector New properties for the Connector that should be
   *                     upgraded.
   */
  public final void upgradeConnector(MConnector oldConnector, MConnector newConnector) {
  ..}

 /**
   * Update the connector with the new data supplied in the
   * <tt>newConnector</tt>. Also Update all configs associated with this
   * connector in the repository with the configs specified in
   * <tt>mConnector</tt>. <tt>mConnector </tt> must
   * minimally have the configurableID and all required configs (including ones
   * which may not have changed). After this operation the repository is
   * guaranteed to only have the new configs specified in this object.
   *
   * @param newConnector The new data to be inserted into the repository for
   *                     this connector.
   * @param tx The repository transaction to use to push the data to the
   *           repository. If this is null, a new transaction will be created.
   *           method will not call begin, commit,
   *           rollback or close on this transaction.
   */

protected abstract void upgradeConnectorAndConfigs(MConnector newConnector, RepositoryTransaction tx);
  

 

Driver Upgrade API

upgradeDriver has a default implementation provided in the Repository.java 
Code Block
 
public final void upgradeDriver(MDriver driver) {
..}
 
/**
   * Upgrade the driver with the new data supplied in the
   * <tt>mDriver</tt>. Also Update all configs associated with the driver
   * in the repository with the configs specified in
   * <tt>mDriver</tt>. <tt>mDriver </tt> must
   * minimally have the configurableID and all required configs (including ones
   * which may not have changed). After this operation the repository is
   * guaranteed to only have the new configs specified in this object.
   *
   * @param newDriver The new data to be inserted into the repository for
   *                     the driverConfig.
   * @param tx The repository transaction to use to push the data to the
   *           repository. If this is null, a new transaction will be created.
   *           method will not call begin, commit,
   *           rollback or close on this transaction.
   */
  protected abstract void upgradeDriverAndConfigs(MDriver newDriver, RepositoryTransaction tx);

 

 

Sqoop Repository Concrete Implementations

Warning
JdbcRepository extends Repository API
 
JdbcRepositoryHandler.java is a replica of the Repository.java class in addition to having the "java.sql.Connection" as a parameter in the API methods.

As of 1.99.5 we have Derby and Postgres implementation for the Repository

Please refer the the DerbyRepositoryHandler and PostgresqlRepositoryHandler for details. They are concrete implementations of  the JdbcRepositoryHandler

 

 

...

  • HAS 1-n CONFIG objects

...

is a type of configurable

There can be many connectors registered to the sqoop server

...

  • HAS 1-n CONFIG objects

...

is a type of configurable

There is only one Driver object representing sqoop in the system

...

MConfigType are the supported config types as of 1.99.5

Code Block
titleMConfigType
collapsetrue
public enum MConfigType {
  /** Unknown config type */
  OTHER,
  @Deprecated
  // NOTE: only exists to support the connector data upgrade path
  CONNECTION,
  /** link config type */ ( should have been  called connector config type !)
  LINK,
  /** Job config type */
  JOB;
}

...

MInput.java an abstract class and @Input annotation

Concrete classes for each supported types

MIntegerInput.java

MStringInput.java

...

  • Associated with a CONFIG object

...

Represents the key-value pairs for a given config.

MInputTypes supported are

Code Block
languagejava
titleMInputType
collapsetrue
public enum MInputType {
  /** Unknown input type */
  OTHER,
  /** String input type */
  STRING,
  /** Map input type */
  MAP,
  /** Integer input type */
  INTEGER,
  /** Boolean input type */
  BOOLEAN,
  /** String based input that can contain only predefined values **/
  ENUM,
  ;
}

...

MLink.java

...

  • Associated with a CONNECTOR
  • HAS a CONFIG-INPUT object

...

MJob.java

MFromConfig.java

MToConfig.java

MDriverConfig.java

...

  • HAS 3 CONFIG-INPUT objects
  • HAS 1-n SUBMISSIONS

...

Represents the sqoop job. It encapsulates all the required configs to run the sqoop job.

Primarily the sqoop job has the 3 main components, the FROM, TO and the DRIVER.

FROM and its related MFromConfig represent the config-inputs-values required to Extract data from the source

TO and its related MToConfig represent the config-inputs-values required to load data to the destination

DRIVER and its related MDriverConfig the config-inputs-values required by the execution engine that runs the sqoop job optimally.

 

...

Represents the job run details. Includes the job status, job counters and metrics from the job execution engine

...