Title : Sqoop Config Input as a Top Level Entity
JIRA: https://issues.apache.org/jira/browse/SQOOP-1516
Table of Contents |
---|
Summary
Configs are exposed in code via the Connectors and Drivers ( the two CONFIGURABLES supported). They annotate the config classes with "@Config" annotation and that is how sqoop registers these entities into the repository during the server startup. If a connector is already existing in the sqoop repository (or during upgrade path) then the connector's upgrade API is invoked to update the attributes of the config object.
The current SQ_CONFIG stores the top level config entries per configurable.
Code Block |
---|
+-------------------------------------+
| SQ_CONFIG |
+-------------------------------------+
| SQ_CFG_ID: BIGINT PK AUTO-GEN |
| SQ_CONFIGURABLE: BIGINT |FK SQ_CONFIGURABLE(SQC_ID)
| SQ_CFG_NAME: VARCHAR(64) |
| SQ_CFG_TYPE: VARCHAR(32) |"LINK"|"JOB"
| SQ_CFG_INDEX: SMALLINT |
+-------------------------------------+
|
Currently we support 2 types of configs. LINK and JOB configs. The MConfigType Enum encapsulates this information. It is the value used in "SQ_CFG_TYPE" when a config is registered.
Code Block |
---|
@InterfaceAudience.Private
@InterfaceStability.Unstable
public enum MConfigType {
/** Unknown config type */
OTHER,
@Deprecated
// NOTE: only exists to support the connector data upgrade path
CONNECTION,
/** link config type */
LINK,
/** Job config type */
JOB;
} |
Each class annotated with @Config, exposes a list of inputs via the "@Input" annotation and its attributes. The @Input annotated fields are stored in another table SQ_INPUT along with supported attribute and their values. The SQ_INPUT only stores the input keys and the attribute values. The actual value for the SQ_INPUT are dependent on per JOB and per LINK ( Refer to this wiki to understand sqoop entities ) and hence there are 2 additional tables where we store the SQ_JOB_INPUT and SQ_LINK_INPUT.
Code Block |
---|
+----------------------------+
| SQ_INPUT |
+----------------------------+
| SQI_ID: BIGINT PK AUTO-GEN |
| SQI_NAME: VARCHAR(64) |
| SQI_CONFIG: BIGINT |FK SQ_CONFIG(SQ_CFG_ID)
| SQI_INDEX: SMALLINT |
| SQI_TYPE: VARCHAR(32) |"STRING"|"MAP"
| SQI_STRMASK: BOOLEAN |
| SQI_STRLENGTH: SMALLINT |
| SQI_ENUMVALS: VARCHAR(100) |
+----------------------------+
+----------------------------+
| SQ_LINK_INPUT |
+----------------------------+
| SQ_LNKI_LINK: BIGINT PK | FK SQ_LINK(SQ_LNK_ID)
| SQ_LNKI_INPUT: BIGINT PK | FK SQ_INPUT(SQI_ID)
| SQ_LNKI_VALUE: LONG VARCHAR|
+----------------------------+
+----------------------------+
| SQ_JOB_INPUT |
+----------------------------+
| SQBI_JOB: BIGINT PK | FK SQ_JOB(SQB_ID)
| SQBI_INPUT: BIGINT PK | FK SQ_INPUT(SQI_ID)
| SQBI_VALUE: LONG VARCHAR |
+----------------------------+
|
The following table lists the type and number in (*) of configs exposed by each of the configurables. Each config object is represented as a list. Hence a connector can expose a FROM-CONFIG with more than one config objects in it.
CONFIGURABLE | LINK-CONFIG | JOB-CONFIG |
---|---|---|
CONNECTOR | (1) LINK-CONFIG
| (2) FROM-CONFIG
TO-CONFIG
|
DRIVER | NONE | 1 DRIVER-CONFIG
|
The current design proposal This is an enhancement proposal to the existing functionality ( command line and rest apis) to support reuse of config objects by providing hooks to perform RUD RU ( Read and Update) operations on the config input objects independently.
Background
Sqoop has a configurable entity that provide config objects. Connector and Driver are the the two examples.
Requirements
- Read and Update the Config Inputs by Type and By Job
Non Goals
- Config
...
Design
Shell Commands
Rest API
...