Title : Sqoop Config Input as a Top Level Entity
JIRA: https://issues.apache.org/jira/browse/SQOOP-1516
Summary
Configs are exposed in code via the Connectors and Drivers ( the two CONFIGURABLES supported). They annotate the config classes with "@Config" annotation and that is how sqoop registers these entities into the repository during the server startup. If a connector is already existing in the sqoop repository (or during upgrade path) then the connector's upgrade API is invoked to update the attributes of the config object.
The current SQ_CONFIG stores the top level config entries per configurable.
+-------------------------------------+ | SQ_CONFIG | +-------------------------------------+ | SQ_CFG_ID: BIGINT PK AUTO-GEN | | SQ_CONFIGURABLE: BIGINT |FK SQ_CONFIGURABLE(SQC_ID) | SQ_CFG_NAME: VARCHAR(64) | | SQ_CFG_TYPE: VARCHAR(32) |"LINK"|"JOB" | SQ_CFG_INDEX: SMALLINT | +-------------------------------------+
Currently we support 2 types of configs. LINK and JOB configs. The MConfigType Enum encapsulates this information. It is the value used in "SQ_CFG_TYPE" when a config is registered.
@InterfaceAudience.Private @InterfaceStability.Unstable public enum MConfigType { /** Unknown config type */ OTHER, @Deprecated // NOTE: only exists to support the connector data upgrade path CONNECTION, /** link config type */ LINK, /** Job config type */ JOB; }
Each class annotated with @Config, exposes a list of inputs via the "@Input" annotation and its attributes. The @Input annotated fields are stored in another table SQ_INPUT along with supported attribute and their values. The SQ_INPUT only stores the input keys and the attribute values. The actual value for the SQ_INPUT are dependent on per JOB and per LINK ( Refer to this wiki to understand sqoop entities ) and hence there are 2 additional tables where we store the SQ_JOB_INPUT and SQ_LINK_INPUT.
+----------------------------+ | SQ_INPUT | +----------------------------+ | SQI_ID: BIGINT PK AUTO-GEN | | SQI_NAME: VARCHAR(64) | | SQI_CONFIG: BIGINT |FK SQ_CONFIG(SQ_CFG_ID) | SQI_INDEX: SMALLINT | | SQI_TYPE: VARCHAR(32) |"STRING"|"MAP" | SQI_STRMASK: BOOLEAN | | SQI_STRLENGTH: SMALLINT | | SQI_ENUMVALS: VARCHAR(100) | +----------------------------+ +----------------------------+ | SQ_LINK_INPUT | +----------------------------+ | SQ_LNKI_LINK: BIGINT PK | FK SQ_LINK(SQ_LNK_ID) | SQ_LNKI_INPUT: BIGINT PK | FK SQ_INPUT(SQI_ID) | SQ_LNKI_VALUE: LONG VARCHAR| +----------------------------+ +----------------------------+ | SQ_JOB_INPUT | +----------------------------+ | SQBI_JOB: BIGINT PK | FK SQ_JOB(SQB_ID) | SQBI_INPUT: BIGINT PK | FK SQ_INPUT(SQI_ID) | SQBI_VALUE: LONG VARCHAR | +----------------------------+
The following table lists the type and number in (*) of configs exposed by each of the configurables. Each config object is represented as a list. Hence a connector can expose a FROM-CONFIG with more than one config objects in it.
CONFIGURABLE | LINK-CONFIG | JOB-CONFIG |
---|---|---|
CONNECTOR | (1) LINK-CONFIG
| (2) FROM-CONFIG
TO-CONFIG
|
DRIVER | NONE | 1 DRIVER-CONFIG
|
The current design proposal enhancement proposal to the existing functionality ( command line and rest apis) to support reuse of config objects by providing hooks to perform RU ( Read and Update) operations on the config input objects independently.
- Read and Update the Config Inputs by Type and By Job
Non Goals
- Config
Design
Shell Commands
Rest API
GET
v1/config/link?configurableId=?&direction ( get all the config details for the given configurable )
POST
v1/config/link?configurableId=?&direction ( post data for the link config object)
Testing