Title : Sqoop Config Input as a Top Level Entity
JIRA: https://issues.apache.org/jira/browse/SQOOP-1516
Summary
Configs are exposed in code via the Connectors and Drivers ( the two CONFIGURABLES supported). They annotate the config classes with "@Config" annotation and that is how sqoop registers these entities into the repository during the server startup. If a connector is already existing in the sqoop repository (or during upgrade path) then the connector's upgrade API is invoked to update the attributes of the config object.
The current SQ_CONFIG stores the top level config entries per configurable.
+-------------------------------------+ | SQ_CONFIG | +-------------------------------------+ | SQ_CFG_ID: BIGINT PK AUTO-GEN | | SQ_CONFIGURABLE: BIGINT |FK SQ_CONFIGURABLE(SQC_ID) | SQ_CFG_NAME: VARCHAR(64) | | SQ_CFG_TYPE: VARCHAR(32) |"LINK"|"JOB" | SQ_CFG_INDEX: SMALLINT | +-------------------------------------+
Currently we support 2 types of configs. LINK and JOB configs. The MConfigType Enum encapsulates this information. It is the value used in "SQ_CFG_TYPE" when a config is registered.
@InterfaceAudience.Private @InterfaceStability.Unstable public enum MConfigType { /** Unknown config type */ OTHER, @Deprecated // NOTE: only exists to support the connector data upgrade path CONNECTION, /** link config type */ LINK, /** Job config type */ JOB; }
Each class annotated with @Config, exposes a list of inputs via the "@Input" annotation and its attributes. The @Input annotated fields are stored in another table SQ_INPUT along with supported attribute and their values. The SQ_INPUT only stores the input keys and the attribute values. The actual value for the SQ_INPUT are dependent on per JOB and per LINK ( Refer to this wiki to understand sqoop entities ) and hence there are 2 additional tables where we store the SQ_JOB_INPUT and SQ_LINK_INPUT.
+----------------------------+ | SQ_INPUT | +----------------------------+ | SQI_ID: BIGINT PK AUTO-GEN | | SQI_NAME: VARCHAR(64) | | SQI_CONFIG: BIGINT |FK SQ_CONFIG(SQ_CFG_ID) | SQI_INDEX: SMALLINT | | SQI_TYPE: VARCHAR(32) |"STRING"|"MAP" | SQI_STRMASK: BOOLEAN | | SQI_STRLENGTH: SMALLINT | | SQI_ENUMVALS: VARCHAR(100) | +----------------------------+ +----------------------------+ | SQ_LINK_INPUT | +----------------------------+ | SQ_LNKI_LINK: BIGINT PK | FK SQ_LINK(SQ_LNK_ID) | SQ_LNKI_INPUT: BIGINT PK | FK SQ_INPUT(SQI_ID) | SQ_LNKI_VALUE: LONG VARCHAR| +----------------------------+ +----------------------------+ | SQ_JOB_INPUT | +----------------------------+ | SQBI_JOB: BIGINT PK | FK SQ_JOB(SQB_ID) | SQBI_INPUT: BIGINT PK | FK SQ_INPUT(SQI_ID) | SQBI_VALUE: LONG VARCHAR | +----------------------------+
The following table lists the type and number in (*) of configs exposed by each of the configurables. Each config object is represented as a list. Hence a connector can expose a FROM-CONFIG with more than one config objects in it.
CONFIGURABLE | LINK-CONFIG | JOB-CONFIG |
---|---|---|
CONNECTOR | (1) LINK-CONFIG
| (2) FROM-CONFIG
TO-CONFIG
|
DRIVER | NONE | 1 DRIVER-CONFIG
|
The current design proposal enhancement proposal to the existing functionality ( command line and rest apis) to support reuse of config objects by providing hooks to perform RU ( Read and Update) operations on the config input objects independently.
Requirements
- Read and Update the Config Inputs by Type and By Job
Non Goals
- Supporting CD ( create / delete ) of Config Inputs via the REST or command line. It is only allowed via the configurable code and supported annotations on the classes.
Design
Shell Commands
Read Config By Type and Job
show config --type from --jid 1
Edit Config By Type and Job
edit config --type from --jid 1
Rest API
Read Config By Type and Job
GET
v1/config/link?configurableId=?&type= ( get all the config details for the given configurable )
Edit Config By Type and Job
POST
v1/config/link?configurableId=?&type= ( post data for the link config object)
Testing
- Unit test are almost non existent for the shell code. Hence we will rely on the basic manual testing
- Rest APIs can be tested via integration tests and that will be part of the work proposed