You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 23 Next »

Title : Sqoop Config Input as a Top Level Entity 

JIRA: https://issues.apache.org/jira/browse/SQOOP-1516

Summary

The current proposal enhances existing functionality ( command line and rest apis)  to support RU ( Read and Update) operations on the config input objects independently.  

 

Background

Configs are exposed in code via the Connectors and Drivers ( the two CONFIGURABLES supported). They annotate the config classes with "@Config" annotation and that is how sqoop registers these entities into the repository during the server startup. If a connector exists in the sqoop repository (during server-start up or while invoking UpgradeTool) then the connector's upgrade API is invoked to update the attributes of the config object.

The current SQ_CONFIG stores the top level config entries per configurable. 

  +-------------------------------------+
     | SQ_CONFIG                           |
     +-------------------------------------+
     | SQ_CFG_ID: BIGINT PK AUTO-GEN       |
     | SQ_CONFIGURABLE: BIGINT             |FK SQ_CONFIGURABLE(SQC_ID)
     | SQ_CFG_NAME: VARCHAR(64)            |
     | SQ_CFG_TYPE: VARCHAR(32)            |"LINK"|"JOB"
     | SQ_CFG_INDEX: SMALLINT              |
     +-------------------------------------+
 

Currently we support 2 types of configs. LINK and JOB configs. The MConfigType Enum encapsulates this information. It is the value used in "SQ_CFG_TYPE" when a config is registered.

@InterfaceAudience.Private
@InterfaceStability.Unstable
public enum MConfigType {
  /** Unknown config type */
  OTHER,
  @Deprecated
  // NOTE: only exists to support the connector data upgrade path
  CONNECTION,
  /** link config type */
  LINK,
  /** Job config type */
  JOB;
}

Each class annotated with @Config, exposes a list of inputs via the "@Input" annotation and its attributes. The @Input annotated fields are stored in another table SQ_INPUT along with supported attribute and their values. The SQ_INPUT only stores the input keys and the attribute values. The actual value for the SQ_INPUT are dependent on per JOB and per LINK ( Refer to this wiki to understand sqoop entities ) and hence there are 2 additional tables where we store the SQ_JOB_INPUT and SQ_LINK_INPUT.

     +----------------------------+
     | SQ_INPUT                   |
     +----------------------------+
     | SQI_ID: BIGINT PK AUTO-GEN |
     | SQI_NAME: VARCHAR(64)      |
     | SQI_CONFIG: BIGINT         |FK SQ_CONFIG(SQ_CFG_ID)
     | SQI_INDEX: SMALLINT        |
     | SQI_TYPE: VARCHAR(32)      |"STRING"|"MAP"
     | SQI_STRMASK: BOOLEAN       |
     | SQI_STRLENGTH: SMALLINT    |
     | SQI_ENUMVALS: VARCHAR(100) |
     +----------------------------+
 
   +----------------------------+
     | SQ_LINK_INPUT              |
     +----------------------------+
     | SQ_LNKI_LINK: BIGINT PK    | FK SQ_LINK(SQ_LNK_ID)
     | SQ_LNKI_INPUT: BIGINT PK   | FK SQ_INPUT(SQI_ID)
     | SQ_LNKI_VALUE: LONG VARCHAR|
     +----------------------------+
     +----------------------------+
     | SQ_JOB_INPUT               |
     +----------------------------+
     | SQBI_JOB: BIGINT PK        | FK SQ_JOB(SQB_ID)
     | SQBI_INPUT: BIGINT PK      | FK SQ_INPUT(SQI_ID)
     | SQBI_VALUE: LONG VARCHAR   |
     +----------------------------+


 

 

The following table lists the type and number in (*) of configs exposed by each of the configurables. Each config object is represented as a list. Hence a connector can expose a FROM-CONFIG with more than one config object in it. 

CONFIGURABLELINK-CONFIGJOB-CONFIG
CONNECTOR

(1)

LINK-CONFIG

MLinkConfigList

(2)

FROM-CONFIG

MFromConfigList

TO-CONFIG

MToConfigList

DRIVERNONE

(1)

DRIVER-CONFIG

MDriverConfigList

 

Requirements
  • Read  the Config Inputs by Type/SubType and By Job /Submission ( since SQOOP-2025 we may be able to have configs by submissionId)
  • Update the Config Inputs by Type/SubType for the latest/last submission in the job. We should not allow editing previous submissions and it should be read only
  • Support this in both shell command and Rest-API.
  • Only the "inputs" with attribute "USER-ONLY" or "ANY" as per SQOOP-1804 will be editable. Make sure to validate the condition of editable and adhere to cascading changes depending on the overrides attribute on each input.
  • Once the input values are edited, the new values will be used in the next job run, unless we maintain history as per SQOOP-2025.

Non Goals

  • Supporting CD ( create / delete ) of Config Inputs via the REST or command line. It is only allowed via the configurable code and supported annotations on the classes today and it should remain so.
  • Editing submission history 

Design and Implementation Details

Shell Commands

Aliases have been added to the MConfigType Enum to indicate the sub types

public enum MConfigType {
  /** Unknown config type */
  OTHER,
  @Deprecated
  // NOTE: only exists to support the connector data upgrade path
  CONNECTION,
  /** link config type */
  LINK("link"),
  /** Job config type */
  JOB("from", "to", "driver");
  
  private List<String> aliases;
  
  MConfigType(String... aliases){
    this.aliases = Arrays.asList(aliases);
  }
  
  List<String> getAliasesByType(MConfigType type) {
    return type.aliases;
  }
  
}

 

Read Config By Type and Job or Submission 

 

// NOTE: all the job config inputs values are for the last job run only since we do not store the config values for each submission yet
1. show config foo --type JOB --subType from --id 1 
//*. show config "foo" --type JOB --subType "from" --sid 1 ( SINCE we are not doing SQOOP-2025, this will not be in the current patch)
2. show config foo  --type JOB --subType to --id 1
3. show config foo --type JOB --subType driver --id 1
4. show config foo --type LINK --subType link --id 1
 
// planned, SQOOP-2046
5. show input "foo" --config bar --type LINK --id 1 // id here refers to the link id
 
 

 

Edit Config By Type and Job ( prev submissions cannot be edited, hence we restrict editing to the last job run only)

1. edit config foo --type JOB --subType from --id 1 // id here refers to the job id
// planned, SQOOP-2046

2. edit input foo --config bar --type LINK --id 1 // id here refers to the link id

Rest API changes

Read Config By Type and Job or Submission

GET
v1/config/JOB?name=?&Id=?&subType= 
 
or
 
GET
v1/config?type=JOB&name=?&Id=?&subType= 


Edit Config By Type and Job

POST
v1/config/LINK?name=?&id=?&subType= 
 
or
POST
v1/config?type=LINK&name=?&id=?&subType=

Repository API changes

  • Add new API to get config inputs by submissionId and type ( read-only)
  • Add new API to get config inputs by jobId and type
  • Add new API to edit/post config inputs by jobId and type
  public abstract MConfig findFromJobConfig(long jobId, String name);
  public abstract MConfig findToJobConfig(long jobId, String name);
  public abstract MConfig findDriverJobConfig(long jobId, String name);
  public abstract MConfig findLinkConfig(long linkId, String name);
  public abstract void updateFromJobConfig(long jobId, MConfig config);
  public abstract void updateDriverobConfig(long jobId, MConfig config);
  public abstract void updateToJobConfig(long jobId, MConfig config);
  public abstract void updateLinkConfig(long linkId, MConfig config);

 

Testing

  • Unit test are almost non existent for the shell code. Hence we will rely on the basic manual testing
  • Rest APIs can be tested via integration tests and that will be part of the work proposed

 

 

 

  • No labels