You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Title : Sqoop Config Input as a Top Level Entity 

JIRA: https://issues.apache.org/jira/browse/SQOOP-1516

Summary

Configs are exposed in code via the Connectors and Drivers ( the two CONFIGURABLES supported). They annotate the config classes with "@Config" annotation and that is how sqoop registers these entities into the repository during the server startup. If a connector is already existing in the sqoop repository (or during upgrade path) then the connector's upgrade API is invoked to update the attributes of the config object.

The current SQ_CONFIG stores the top level config entries per configurable. 

  +-------------------------------------+
     | SQ_CONFIG                           |
     +-------------------------------------+
     | SQ_CFG_ID: BIGINT PK AUTO-GEN       |
     | SQ_CONFIGURABLE: BIGINT             |FK SQ_CONFIGURABLE(SQC_ID)
     | SQ_CFG_NAME: VARCHAR(64)            |
     | SQ_CFG_TYPE: VARCHAR(32)            |"LINK"|"JOB"
     | SQ_CFG_INDEX: SMALLINT              |
     +-------------------------------------+
 

Currently we support 2 types of configs. LINK and JOB configs. The MConfigType Enum encapsulates this information. It is the value used in "SQ_CFG_TYPE" when a config is registered.

@InterfaceAudience.Private
@InterfaceStability.Unstable
public enum MConfigType {
  /** Unknown config type */
  OTHER,
  @Deprecated
  // NOTE: only exists to support the connector data upgrade path
  CONNECTION,
  /** link config type */
  LINK,
  /** Job config type */
  JOB;
}

 Each class annotated with @Config, exposes a list of inputs via the "@Input" annotation and its attributes. The @Input annotated fields are stored in another table SQ_INPUT along with supported attribute and their values. The SQ_INPUT only stores the input keys and the attribute values. The actual value for the SQ_INPUT are dependent on per JOB and per LINK ( Refer to this wiki to understand sqoop entities ) and hence there are 2 additional tables where we store the SQ_JOB_INPUT and SQ_LINK_INPUT.

     +----------------------------+
     | SQ_INPUT                   |
     +----------------------------+
     | SQI_ID: BIGINT PK AUTO-GEN |
     | SQI_NAME: VARCHAR(64)      |
     | SQI_CONFIG: BIGINT         |FK SQ_CONFIG(SQ_CFG_ID)
     | SQI_INDEX: SMALLINT        |
     | SQI_TYPE: VARCHAR(32)      |"STRING"|"MAP"
     | SQI_STRMASK: BOOLEAN       |
     | SQI_STRLENGTH: SMALLINT    |
     | SQI_ENUMVALS: VARCHAR(100) |
     +----------------------------+
 
   +----------------------------+
     | SQ_LINK_INPUT              |
     +----------------------------+
     | SQ_LNKI_LINK: BIGINT PK    | FK SQ_LINK(SQ_LNK_ID)
     | SQ_LNKI_INPUT: BIGINT PK   | FK SQ_INPUT(SQI_ID)
     | SQ_LNKI_VALUE: LONG VARCHAR|
     +----------------------------+
     +----------------------------+
     | SQ_JOB_INPUT               |
     +----------------------------+
     | SQBI_JOB: BIGINT PK        | FK SQ_JOB(SQB_ID)
     | SQBI_INPUT: BIGINT PK      | FK SQ_INPUT(SQI_ID)
     | SQBI_VALUE: LONG VARCHAR   |
     +----------------------------+


 

 

The following table lists the type and number in (*) of configs exposed by each of the configurables. Each config object is represented as a list. Hence a connector can expose a FROM-CONFIG with more than one config objects in it. 

CONFIGURABLELINK-CONFIGJOB-CONFIG
CONNECTOR

(1)

LINK-CONFIG

MLinkConfiggList

(2)

FROM-CONFIG

MFromConfigList

TO-CONFIG

MToConfiggList

DRIVERNONE

1

DRIVER-CONFIG

MDriverConfiggList

 

The current proposal enhances existing functionality ( command line and rest apis)  to support reuse of config objects by providing hooks to perform RU ( Read and Update) operations on the config input objects independently. 


Requirements

  • Read and Update the Config Inputs by Type and By Job

 

Non Goals

  • Supporting CD ( create / delete ) of Config Inputs via the REST or command line. It is only allowed via the configurable code and supported annotations on the classes.

Design

Shell Commands

Read Config By Type and Job or Submission 

show config --type from --jid 1 - will provide the config value for the last job run
show config --type from --sid 1




 

 

Edit Config By Type and Job ( prev submissions cannot be edited )

edit config --type from --jid 1 - will provide the config value for the last job run
// we can only edit the last job run values

 

Supporte type values ( from, to, link, driver )

Rest API 

Read Config By Type and Job or Submission

GET
v1/config?jId=?&type= 

Edit Config By Type and Job

POST

v1/config/link?jId=?&type=

Testing

  • Unit test are almost non existent for the shell code. Hence we will rely on the basic manual testing
  • Rest APIs can be tested via integration tests and that will be part of the work proposed

 

 

 

  • No labels