You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 21 Next »

Title : Sqoop Config Input as a Top Level Entity 

JIRA: https://issues.apache.org/jira/browse/SQOOP-1516

Summary

The current proposal enhances existing functionality ( command line and rest apis)  to support RU ( Read and Update) operations on the config input objects independently.  

 

Background

Configs are exposed in code via the Connectors and Drivers ( the two CONFIGURABLES supported). They annotate the config classes with "@Config" annotation and that is how sqoop registers these entities into the repository during the server startup. If a connector exists in the sqoop repository (during server-start up or while invoking UpgradeTool) then the connector's upgrade API is invoked to update the attributes of the config object.

The current SQ_CONFIG stores the top level config entries per configurable. 

  +-------------------------------------+
     | SQ_CONFIG                           |
     +-------------------------------------+
     | SQ_CFG_ID: BIGINT PK AUTO-GEN       |
     | SQ_CONFIGURABLE: BIGINT             |FK SQ_CONFIGURABLE(SQC_ID)
     | SQ_CFG_NAME: VARCHAR(64)            |
     | SQ_CFG_TYPE: VARCHAR(32)            |"LINK"|"JOB"
     | SQ_CFG_INDEX: SMALLINT              |
     +-------------------------------------+
 

Currently we support 2 types of configs. LINK and JOB configs. The MConfigType Enum encapsulates this information. It is the value used in "SQ_CFG_TYPE" when a config is registered.

@InterfaceAudience.Private
@InterfaceStability.Unstable
public enum MConfigType {
  /** Unknown config type */
  OTHER,
  @Deprecated
  // NOTE: only exists to support the connector data upgrade path
  CONNECTION,
  /** link config type */
  LINK,
  /** Job config type */
  JOB;
}

Each class annotated with @Config, exposes a list of inputs via the "@Input" annotation and its attributes. The @Input annotated fields are stored in another table SQ_INPUT along with supported attribute and their values. The SQ_INPUT only stores the input keys and the attribute values. The actual value for the SQ_INPUT are dependent on per JOB and per LINK ( Refer to this wiki to understand sqoop entities ) and hence there are 2 additional tables where we store the SQ_JOB_INPUT and SQ_LINK_INPUT.

     +----------------------------+
     | SQ_INPUT                   |
     +----------------------------+
     | SQI_ID: BIGINT PK AUTO-GEN |
     | SQI_NAME: VARCHAR(64)      |
     | SQI_CONFIG: BIGINT         |FK SQ_CONFIG(SQ_CFG_ID)
     | SQI_INDEX: SMALLINT        |
     | SQI_TYPE: VARCHAR(32)      |"STRING"|"MAP"
     | SQI_STRMASK: BOOLEAN       |
     | SQI_STRLENGTH: SMALLINT    |
     | SQI_ENUMVALS: VARCHAR(100) |
     +----------------------------+
 
   +----------------------------+
     | SQ_LINK_INPUT              |
     +----------------------------+
     | SQ_LNKI_LINK: BIGINT PK    | FK SQ_LINK(SQ_LNK_ID)
     | SQ_LNKI_INPUT: BIGINT PK   | FK SQ_INPUT(SQI_ID)
     | SQ_LNKI_VALUE: LONG VARCHAR|
     +----------------------------+
     +----------------------------+
     | SQ_JOB_INPUT               |
     +----------------------------+
     | SQBI_JOB: BIGINT PK        | FK SQ_JOB(SQB_ID)
     | SQBI_INPUT: BIGINT PK      | FK SQ_INPUT(SQI_ID)
     | SQBI_VALUE: LONG VARCHAR   |
     +----------------------------+


 

 

The following table lists the type and number in (*) of configs exposed by each of the configurables. Each config object is represented as a list. Hence a connector can expose a FROM-CONFIG with more than one config object in it. 

CONFIGURABLELINK-CONFIGJOB-CONFIG
CONNECTOR

(1)

LINK-CONFIG

MLinkConfigList

(2)

FROM-CONFIG

MFromConfigList

TO-CONFIG

MToConfigList

DRIVERNONE

(1)

DRIVER-CONFIG

MDriverConfigList

 

Requirements
  • Read and Update the Config Inputs by Type and By Job /Submission ( since SQOOP-2025 we may be able to have configs by submissionId)
  • Support this in both shell command and Rest-API.
  • Only the "inputs" with attribute "USER-ONLY" or "ANY" as per SQOOP-1804 will be editable. Make sure to validate the condition of editable and adhere to cascading changes depending on the overrides attribute on each input.
  • Once the input values are edited, the new values will be used in the next job run, unless we maintain history as per SQOOP-2025.

Non Goals

  • Supporting CD ( create / delete ) of Config Inputs via the REST or command line. It is only allowed via the configurable code and supported annotations on the classes today and it should remain so.

Design and Implementation Details

Shell Commands

Aliases have been added to the MConfigType Enum to indicate the sub types

public enum MConfigType {
  /** Unknown config type */
  OTHER,
  @Deprecated
  // NOTE: only exists to support the connector data upgrade path
  CONNECTION,
  /** link config type */
  LINK("link"),
  /** Job config type */
  JOB("from", "to", "driver");
  
  private List<String> aliases;
  
  MConfigType(String... aliases){
    this.aliases = Arrays.asList(aliases);
  }
  
  List<String> getAliasesByType(MConfigType type) {
    return type.aliases;
  }
  
}

 

Read Config By Type and Job or Submission 

 

// NOTE: all the job config inputs values are for the last job run only since we do not store the config values for each submission yet
1. show config foo --type JOB --subType from --id 1 
//*. show config "foo" --type JOB --subType "from" --sid 1 ( SINCE we are not doing SQOOP-2025, this will not be in the current patch)
2. show config foo  --type JOB --subType to --id 1
3. show config foo --type JOB --subType driver --id 1
4. show config foo --type LINK --subType link --id 1
 
// planned, SQOOP-2046
5. show input "foo" --config bar --type LINK --id 1 // id here refers to the link id
 
 

 

Edit Config By Type and Job ( prev submissions cannot be edited, hence we restrict editing to the last job run only)

1. edit config foo --type JOB --subType from --id 1 // id here refers to the job id
// planned, SQOOP-2046

2. edit input foo --config bar --type LINK --id 1 // id here refers to the link id

Rest API changes

Read Config By Type and Job or Submission

GET
v1/config/JOB?name=?&Id=?&subType= 

Edit Config By Type and Job

POST
v1/config/LINK?name=?&id=?&subType= 

Repository API changes

  • Add new API to get config inputs by submissionId and type ( read-only)
  • Add new API to get config inputs by jobId and type
  • Add new API to edit/post config inputs by jobId and type
 public abstract MConfig findJobConfig(long jobId, String name);
  public abstract MConfig findToJobConfig(long jobId, String name);
  public abstract MConfig findDriverJobConfig(long jobId, String name);
  public abstract MConfig findLinkConfig(long linkId, String name);
  public abstract void updateFromJobConfig(long jobId, MConfig config);
  public abstract void updateDriverobConfig(long jobId, MConfig config);
  public abstract void updateToJobConfig(long jobId, MConfig config);
  public abstract void updateLinkConfig(long linkId, MConfig config);

 

Testing

  • Unit test are almost non existent for the shell code. Hence we will rely on the basic manual testing
  • Rest APIs can be tested via integration tests and that will be part of the work proposed

 

 

 

  • No labels