Title : Sqoop Config Input as a Top Level Entity
JIRA: https://issues.apache.org/jira/browse/SQOOP-1516
Summary
The current proposal enhances existing functionality ( command line and rest apis) to support RU ( Read and Update) operations on the config input objects independently.
Configs are exposed in code via the Connectors and Drivers ( the two CONFIGURABLES supported). They annotate the config classes with "@Config" annotation and that is how sqoop registers these entities into the repository during the server startup. If a connector exists in the sqoop repository (during server-start up or while invoking UpgradeTool) then the connector's upgrade API is invoked to update the attributes of the config object.
The current SQ_CONFIG stores the top level config entries per configurable.
+-------------------------------------+ | SQ_CONFIG | +-------------------------------------+ | SQ_CFG_ID: BIGINT PK AUTO-GEN | | SQ_CONFIGURABLE: BIGINT |FK SQ_CONFIGURABLE(SQC_ID) | SQ_CFG_NAME: VARCHAR(64) | | SQ_CFG_TYPE: VARCHAR(32) |"LINK"|"JOB" | SQ_CFG_INDEX: SMALLINT | +-------------------------------------+
Currently we support 2 types of configs. LINK and JOB configs. The MConfigType Enum encapsulates this information. It is the value used in "SQ_CFG_TYPE" when a config is registered.
@InterfaceAudience.Private @InterfaceStability.Unstable public enum MConfigType { /** Unknown config type */ OTHER, @Deprecated // NOTE: only exists to support the connector data upgrade path CONNECTION, /** link config type */ LINK, /** Job config type */ JOB; }
Each class annotated with @Config, exposes a list of inputs via the "@Input" annotation and its attributes. The @Input annotated fields are stored in another table SQ_INPUT along with supported attribute and their values. The SQ_INPUT only stores the input keys and the attribute values. The actual value for the SQ_INPUT are dependent on per JOB and per LINK ( Refer to this wiki to understand sqoop entities ) and hence there are 2 additional tables where we store the SQ_JOB_INPUT and SQ_LINK_INPUT.
+----------------------------+ | SQ_INPUT | +----------------------------+ | SQI_ID: BIGINT PK AUTO-GEN | | SQI_NAME: VARCHAR(64) | | SQI_CONFIG: BIGINT |FK SQ_CONFIG(SQ_CFG_ID) | SQI_INDEX: SMALLINT | | SQI_TYPE: VARCHAR(32) |"STRING"|"MAP" | SQI_STRMASK: BOOLEAN | | SQI_STRLENGTH: SMALLINT | | SQI_ENUMVALS: VARCHAR(100) | +----------------------------+ +----------------------------+ | SQ_LINK_INPUT | +----------------------------+ | SQ_LNKI_LINK: BIGINT PK | FK SQ_LINK(SQ_LNK_ID) | SQ_LNKI_INPUT: BIGINT PK | FK SQ_INPUT(SQI_ID) | SQ_LNKI_VALUE: LONG VARCHAR| +----------------------------+ +----------------------------+ | SQ_JOB_INPUT | +----------------------------+ | SQBI_JOB: BIGINT PK | FK SQ_JOB(SQB_ID) | SQBI_INPUT: BIGINT PK | FK SQ_INPUT(SQI_ID) | SQBI_VALUE: LONG VARCHAR | +----------------------------+
The following table lists the type and number in (*) of configs exposed by each of the configurables. Each config object is represented as a list. Hence a connector can expose a FROM-CONFIG with more than one config object in it.
CONFIGURABLE | LINK-CONFIG | JOB-CONFIG |
---|---|---|
CONNECTOR | (1) LINK-CONFIG
| (2) FROM-CONFIG
TO-CONFIG
|
DRIVER | NONE | (1) DRIVER-CONFIG
|
- Read the Config Inputs by Type/SubType and By Job /Submission ( since SQOOP-2025 we may be able to have configs by submissionId)
- Update the Config Inputs by Type/SubType for the latest/last submission in the job. We should not allow editing previous submissions and it should be read only
- Support this in both shell command and Rest-API.
- Only the "inputs" with attribute "USER-ONLY" or "ANY" as per SQOOP-1804 will be editable. Make sure to validate the condition of editable and adhere to cascading changes depending on the overrides attribute on each input.
- Once the input values are edited, the new values will be used in the next job run, unless we maintain history as per SQOOP-2025.
Non Goals
- Supporting CD ( create / delete ) of Config Inputs via the REST or command line. It is only allowed via the configurable code and supported annotations on the classes today and it should remain so.
- Editing submission history
Design and Implementation Details
Shell Commands
Aliases have been added to the MConfigType
Enum to indicate the sub types
public enum MConfigType { /** Unknown config type */ OTHER, @Deprecated // NOTE: only exists to support the connector data upgrade path CONNECTION, /** link config type */ LINK("link"), /** Job config type */ JOB("from", "to", "driver"); private List<String> aliases; MConfigType(String... aliases){ this.aliases = Arrays.asList(aliases); } List<String> getAliasesByType(MConfigType type) { return type.aliases; } }
Read Config By Type and Job or Submission
// NOTE: all the job config inputs values are for the last job run only since we do not store the config values for each submission yet 1. show config foo --type JOB --subType from --id 1 //*. show config "foo" --type JOB --subType "from" --sid 1 ( SINCE we are not doing SQOOP-2025, this will not be in the current patch) 2. show config foo --type JOB --subType to --id 1 3. show config foo --type JOB --subType driver --id 1 4. show config foo --type LINK --subType link --id 1 // planned, SQOOP-2046 5. show input "foo" --config bar --type LINK --id 1 // id here refers to the link id
Edit Config By Type and Job ( prev submissions cannot be edited, hence we restrict editing to the last job run only)
1. edit config foo --type JOB --subType from --id 1 // id here refers to the job id // planned, SQOOP-2046 2. edit input foo --config bar --type LINK --id 1 // id here refers to the link id
Rest API changes
Read Config By Type and Job or Submission
GET v1/config/JOB?name=?&Id=?&subType= or GET v1/config?type=JOB&name=?&Id=?&subType=
Edit Config By Type and Job
POST v1/config/LINK?name=?&id=?&subType= or POST v1/config?type=LINK&name=?&id=?&subType=
Repository API changes
- Add new API to get config inputs by submissionId and type ( read-only)
- Add new API to get config inputs by jobId and type
- Add new API to edit/post config inputs by jobId and type
public abstract MConfig findFromJobConfig(long jobId, String name); public abstract MConfig findToJobConfig(long jobId, String name); public abstract MConfig findDriverJobConfig(long jobId, String name); public abstract MConfig findLinkConfig(long linkId, String name); public abstract void updateFromJobConfig(long jobId, MConfig config); public abstract void updateDriverobConfig(long jobId, MConfig config); public abstract void updateToJobConfig(long jobId, MConfig config); public abstract void updateLinkConfig(long linkId, MConfig config);
Testing
- Unit test are almost non existent for the shell code. Hence we will rely on the basic manual testing
- Rest APIs can be tested via integration tests and that will be part of the work proposed