You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

Title : Sqoop Config Input as a Top Level Entity 

JIRA: https://issues.apache.org/jira/browse/SQOOP-1516

Summary

Configs are exposed in code via the Connectors and Drivers ( the two CONFIGURABLES supported). They annotate the config classes with "@Config" annotation and that is how sqoop registers these entities into the repository during the server startup. If a connector exists in the sqoop repository (during server-start up or while invoking UpgradeTool) then the connector's upgrade API is invoked to update the attributes of the config object.

The current SQ_CONFIG stores the top level config entries per configurable. 

  +-------------------------------------+
     | SQ_CONFIG                           |
     +-------------------------------------+
     | SQ_CFG_ID: BIGINT PK AUTO-GEN       |
     | SQ_CONFIGURABLE: BIGINT             |FK SQ_CONFIGURABLE(SQC_ID)
     | SQ_CFG_NAME: VARCHAR(64)            |
     | SQ_CFG_TYPE: VARCHAR(32)            |"LINK"|"JOB"
     | SQ_CFG_INDEX: SMALLINT              |
     +-------------------------------------+
 

Currently we support 2 types of configs. LINK and JOB configs. The MConfigType Enum encapsulates this information. It is the value used in "SQ_CFG_TYPE" when a config is registered.

@InterfaceAudience.Private
@InterfaceStability.Unstable
public enum MConfigType {
  /** Unknown config type */
  OTHER,
  @Deprecated
  // NOTE: only exists to support the connector data upgrade path
  CONNECTION,
  /** link config type */
  LINK,
  /** Job config type */
  JOB;
}

 Each class annotated with @Config, exposes a list of inputs via the "@Input" annotation and its attributes. The @Input annotated fields are stored in another table SQ_INPUT along with supported attribute and their values. The SQ_INPUT only stores the input keys and the attribute values. The actual value for the SQ_INPUT are dependent on per JOB and per LINK ( Refer to this wiki to understand sqoop entities ) and hence there are 2 additional tables where we store the SQ_JOB_INPUT and SQ_LINK_INPUT.

     +----------------------------+
     | SQ_INPUT                   |
     +----------------------------+
     | SQI_ID: BIGINT PK AUTO-GEN |
     | SQI_NAME: VARCHAR(64)      |
     | SQI_CONFIG: BIGINT         |FK SQ_CONFIG(SQ_CFG_ID)
     | SQI_INDEX: SMALLINT        |
     | SQI_TYPE: VARCHAR(32)      |"STRING"|"MAP"
     | SQI_STRMASK: BOOLEAN       |
     | SQI_STRLENGTH: SMALLINT    |
     | SQI_ENUMVALS: VARCHAR(100) |
     +----------------------------+
 
   +----------------------------+
     | SQ_LINK_INPUT              |
     +----------------------------+
     | SQ_LNKI_LINK: BIGINT PK    | FK SQ_LINK(SQ_LNK_ID)
     | SQ_LNKI_INPUT: BIGINT PK   | FK SQ_INPUT(SQI_ID)
     | SQ_LNKI_VALUE: LONG VARCHAR|
     +----------------------------+
     +----------------------------+
     | SQ_JOB_INPUT               |
     +----------------------------+
     | SQBI_JOB: BIGINT PK        | FK SQ_JOB(SQB_ID)
     | SQBI_INPUT: BIGINT PK      | FK SQ_INPUT(SQI_ID)
     | SQBI_VALUE: LONG VARCHAR   |
     +----------------------------+


 

 

The following table lists the type and number in (*) of configs exposed by each of the configurables. Each config object is represented as a list. Hence a connector can expose a FROM-CONFIG with more than one config objects in it. 

CONFIGURABLELINK-CONFIGJOB-CONFIG
CONNECTOR

(1)

LINK-CONFIG

MLinkConfigList

(2)

FROM-CONFIG

MFromConfigList

TO-CONFIG

MToConfigList

DRIVERNONE

1

DRIVER-CONFIG

MDriverConfigList

 

The current proposal enhances existing functionality ( command line and rest apis)  to support reuse of config objects by providing hooks to perform RU ( Read and Update) operations on the config input objects independently. 


Requirements
  • Read and Update the Config Inputs by Type and By Job /Submission ( since SQOOP-2025 we may be able to have configs by submissionId)
  • Support this in both shell command and Rest-API.
  • Only the "inputs" with attribute "USER-ONLY" or "ANY" as per SQOOP-1804 will be editable. 
  • Once the input values are edited, the new values will be used in the next job run, unless we maintain history as per SQOOP-2025.

Non Goals

  • Supporting CD ( create / delete ) of Config Inputs via the REST or command line. It is only allowed via the configurable code and supported annotations on the classes today and it should remain so.

Design and Implementation Details

Shell Commands

A new Enum will be added to list the config types - 

 from, to, link, driver

Read Config By Type and Job or Submission 

 

// Supported type values ( from, to, link, driver )



1. show config --type from --jid 1 - will provide the config value for the last job run
2. show config --type from --sid 1

 

Edit Config By Type and Job ( prev submissions cannot be edited, hence we restrict editing to the last job run only)

1. edit config --type from --jid 1 - will provide the config value for the last job run
// we can only edit the last job run values

Rest API changes

Read Config By Type and Job or Submission

GET
v1/config?jId=?&type= 

Edit Config By Type and Job

POST
v1/config/link?jId=?&type=

Repository API changes

  • Add new API to get config inputs by submissionId and type ( read-only)
  • Add new API to get config inputs by jobId and type
  • Add new API to edit/post config inputs by jobId and type

Testing

  • Unit test are almost non existent for the shell code. Hence we will rely on the basic manual testing
  • Rest APIs can be tested via integration tests and that will be part of the work proposed

 

 

 

  • No labels