Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

CONFIGURABLELINK-CONFIGJOB-CONFIG
CONNECTOR

(1)

LINK-CONFIG

MLinkConfigList

(2)

FROM-CONFIG

MFromConfigList

TO-CONFIG

MToConfigList

DRIVERNONE

(1)

DRIVER-CONFIG

MDriverConfigList

 

Requirements

...

Terminology

Configuration : It refers to a class  in sqoop annotated with @ConfigurationClass and represents a grouping of related configs that the corresponding configurable that owns this configuration class exposes

Here is the code example. 

Code Block
@ConfigurationClass
public class FromJobConfiguration {
  @Config public FromJobConfig fromJobConfig;
  @Config public IncrementalExtractConfig incrementalExtractConfig;
  public FromJobConfiguration() {
    fromJobConfig = new FromJobConfig();
    incrementalExtractConfig = new IncrementalExtractConfig();
  }
}

 

Config or ConfigInputs  : They both mean the same in this wiki and are used interchangeably, i.e they represent a class in sqoop that are annotated with the @ConfigClass annotation and used in the @ConfigurationClass annotated class with @Config annotation

Code Block
@ConfigClass
public class IncrementalExtractConfig {
  @Input(size = 50, editable = InputEditable.USER_ONLY)
  public String key;
  // Optional tells the type of the key, if it s long, int, date, timestamp etc
  @Input
  public String dataType;
  @Input(size = 50, editable = InputEditable.USER_ONLY, overrides = "lastReadValue")
  public String value;
  @Input(editable = InputEditable.CONNECTOR_ONLY)
  public long lastReadValue;
}

 

Inputs  : They are represented by the @Input annotation on a variable inside a class annotated with @ConfigClass. 

Code Block
  @Input(size = 50, editable = InputEditable.USER_ONLY, overrides = "lastReadValue")
  public String value;
  @Input(editable = InputEditable.CONNECTOR_ONLY)
  public long lastReadValue;

 

Requirements

  • Configs should be treated as a top level entity as they become more sophisticated with the SQOOP-1804 and in future with the implementation of SQOOP-1643. Currently editing and reading config/ config inputs can only happen via a job or link. This proposal aims at making the config objects as first class citizens, so they can be read and edited by their unique name.
  • Config objects per MConfigType (i,e JOB and LINK)  are lists. So ability to edit per "CONFIG" name  is easier than having to edit per "CONFIG LIST" associated with the type. Users do see the names when they list the configs per connector, we can also have a shell command/ rest-API to list all the configs per configurable to make it easier for users to reference configs by name
  • Read  the Config Inputs by Type/SubType and By Job /Submission ( since SQOOP-2025 we may be able to have configs by submissionId)
  • Update the Config Inputs by Type/SubType for the latest/last submission in the job. We should not allow editing previous submissions and it should be read only
  • Only the "inputs" with attribute "USER-ONLY" or "ANY" as per SQOOP-1804 will be editable. Make sure to validate the condition of editable and adhere to cascading changes depending on the overrides attribute on each input.
  • Once the input values are edited, the new values will be used in the next job run, unless we maintain history as per SQOOP-2025.

NOTE : Support all the above in both shell command and Rest-API.

 

Non Goals

  • Supporting CD ( create / delete ) of Config Inputs via the REST or command line. It is only allowed via the configurable code and supported annotations on the classes today and it should remain so.
  • Editing submission history 

Design and Implementation Details

Here are some details I considered when coming up with the subType and why it made sense.

 
At one point, I thought why not have direction as a parameter for type, JOB, but direction is not relevant to all configurables. i,e if for the driver configs, "direction" has no meaning. Similarly for the type "LINK" there is no concept of direction.
Hence I went with the subType, where subType is a second level hierarchy for distinguishing the types of configs that are supported in sqoop
Alternatives are possible, but we have to bear in mind that config/ config inputs are not associated with jobs and links,  They are associated with connectors/driver ( i,e configurables )
The config input values are associated with jobs and links rather, so when reading/ editing the config input values, either we can use JOB/ LINK entities  or rely more on the "CONFIGTYPE" enum, if we want to treat config as a first class citizen

SubType for MConfigType

SubTypes have been added to the MConfigType Enum to indicate the sub types

Code Block
/**
 * Represents the various config types supported by the system.
 */
@InterfaceAudience.Private
@InterfaceStability.Unstable
public enum MConfigType {
  /** Unknown config type */
  OTHER,
  @Deprecated
  // NOTE: only exists to support the connector data upgrade path
  CONNECTION,
  /** link config type */
  LINK("link"),
  /** Job config type */
  JOB("from", "to", "driver");
  
  private List<String> subType;
  
  MConfigType(String... aliases){
    this.subType = Arrays.asList(aliases);
  }
  
  List<String> getSubTypes(MConfigType type) {
    return type.subType;
  } 
}


 

Shell Commands

 

Read Config By Type and Job or Submission 

NOTE: all the job config inputs values are for the last job run only since we do not store the config values for each submission yet

Code Block
1. show config foo --type JOB --subType from --id 1 or show config 1  --type JOB --subType from --id 1 ( NOTE: we can use either id or name )
  //*. show config "foo" --type JOB --subType "from" --sid 1 ( SINCE we are not doing SQOOP-2025, this will not be in the current patch) 
 
2. show config foo --type JOB --subType to --id 1 
 
3. show config foo --type JOB --subType driver --id 1
 
4

...

Non Goals

  • Supporting CD ( create / delete ) of Config Inputs via the REST or command line. It is only allowed via the configurable code and supported annotations on the classes today and it should remain so.

Design and Implementation Details

Shell Commands

Aliases have been added to the MConfigType Enum to indicate the sub types

Code Block
public enum MConfigType {
  /** Unknown config type */
  OTHER,
  @Deprecated
  // NOTE: only exists to support the connector data upgrade path
  CONNECTION,
  /** link config type */
  LINK("link"),
  /** Job config type */
  JOB("from", "to", "driver");
  
  private List<String> aliases;
  
  MConfigType(String... aliases){
    this.aliases = Arrays.asList(aliases);
  }
  
  List<String> getAliasesByType(MConfigType type) {
    return type.aliases;
  }
  
}

 

Read Config By Type and Job or Submission 

 

Code Block
// NOTE: all the job config inputs values are for the last job run only since we do not store the config values for each submission yet
1. show config foo --type JOBLINK --subType from --id 1 
//* link --id 1   
 
//  planned, SQOOP-2046 as a sub ticket of SQOOP-1516
5. show configinput "foo" --typeconfig JOBbar --subTypetype "from"LINK --sidid 1 ( SINCE we are not doing SQOOP-2025, this will not be in the current patch)
2. show config foo  --type JOB --subType to --id 1
3. show// id here refers to the link id    


 

 

Edit Config By Type and Job

NOTE:  Prev submissions cannot be edited, hence we restrict editing to the last job run only)

Code Block
1. edit config foo --type JOB --subType driverfrom --id 1
4. show// configid foohere --typerefers LINKto --subTypethe link --id 1job id 
 
// planned, SQOOP-2046
5. show2046 as a sub ticket of SQOOP-1516
2. edit input "foo" --config bar --type LINK --id 1 // id here refers to the link id
 
 

 

Edit Config By Type and Job ( prev submissions cannot be edited, hence we restrict editing to the last job run only)

ALTERNATIVELY, It was suggested that for command line it is NOT good not to introduce config as a top level ( It was a preference from Gwen Shapira and Qian Xu).  Hence the command are

Code Block
edit job --jid 1 --
Code Block
1. edit config foo --type from 
show JOBjob --subTypejid from1 --idconfig 1 // id here refers to the job id
// planned, SQOOP-2046

2. edit input foofoo --type from 
edit job --jid 1 --config foo --type to
edit job --jid 1 --config barfoo --type LINK driver
 
edit link --idlid 1 // id here refers to the link id--config foo --type link 



 

 

Rest API changes

The resource returned from the APIs is a config object  MConfig

Read Config By Type and Job or Submission

 

Code Block
GET
v1/config?jId=?&type= 1. GET v1/config/job?name=?&Id=?&type=   
 
or   
 
2. GET v1/config?type=JOB&name=?&Id=?&subType=

Edit Config By Type and Job


Code Block
1. POST
 v1/config/link?jId?name=?&id=?&type=  
 
 or 
 
2. POST v1/config?type=LINK&name=?&id=?&typesubType=


 

We chose #1 in both the cases


Repository API changes

...

Add new API to get config inputs by jobId and type, name

Add new API to edit/post config inputs by jobId and type, name

 

Code Block
  public abstract MConfig findJobConfigfindFromJobConfig(long jobId, String name);
  public abstract MConfig findToJobConfig(long jobId, String name);
  public abstract MConfig findDriverJobConfig(long jobId, String name);
  public abstract MConfig findLinkConfig(long linkId, String name);
  public abstract void updateFromJobConfig(long jobId, MConfig config);
  public abstract void updateDriverobConfig(long jobId, MConfig config);
// the type exists to check who is editing the configs, the connector code via upgrade or the user ( from rest / command line )
  public abstract void updateToJobConfigupdateJobConfig(long jobId, MConfig config, MConfigUpdateEntity type);
  public abstract void updateLinkConfig(long linkId, MConfig config, MConfigUpdateEntity type));

 

Testing

Unit test are almost non existent for the shell code. Hence we will rely on the basic manual testing

Rest APIs can be tested via integration tests and that will be part of the work proposed