Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

This document details the internals of how the sqoop-server works.

 

 

 

...

Warning

This document is relevant to the release 1.99.5. Further changes can happen in future releases

Table of Contents

Sqoop Tomcat Server

  • Sqoop-server uses tomcat web server, it is very bare bones.
  • The main entry point is the TomcatToolRunner

...

  • , it bootstraps the tomcat and loads all the sqoop related classes into its class path.

...

  •  It is invoked from the bash script .

    Code Block
    /sqoop.sh server start 
  • The main hook for the sqoop server to start is this entry in the web.xml. Tomcat invokes it callbacks as it

...

  • bootstraps and we use the 

...

  • contextInitialized callback to initialize all the related code.

...

 

Code Block
<!-- Listeners -->

...

  <listener>

...


   <listener>
    <listener-class>org.apache.sqoop.server.ServerInitializer</listener-class>

...


  </listener>
 

 

Sqoop Server

3.
  • The sqoop server is represented by the java class SqoopServer.java
  • SqoopServer.initialize() is the
main entry point.

4. There are a bunch of servlets in web.xml : https://github.com/apache/sqoop/blob/sqoop2/server/src/main/webapp/WEB-INF/web.xml

They receive the requests and process it. Each
  • called from the ServerInitiaizer
  • SqoopServer.destroy() is called when the tomcat server is shutdown

Sqoop Servlets

Code Block
  <!-- Version servlet -->
  <servlet>
    <servlet-name>VersionServlet</servlet-name>
    <servlet-class>org.apache.sqoop.server.VersionServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>VersionServlet</servlet-name>
    <url-pattern>/version</url-pattern>
  </servlet-mapping>
   <!-- Generic Configurable servlet -->
  <servlet>
    <servlet-name>v1.ConfigurableServlet</servlet-name>
    <servlet-class>org.apache.sqoop.server.v1.ConfigurableServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>v1.ConfigurableServlet</servlet-name>
    <url-pattern>/v1/configurable/*</url-pattern>
  </servlet-mapping>
  <!-- Connector servlet -->
  <servlet>
    <servlet-name>v1.ConnectorServlet</servlet-name>
    <servlet-class>org.apache.sqoop.server.v1.ConnectorServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>v1.ConnectorServlet</servlet-name>
    <url-pattern>/v1/connector/*</url-pattern>
  </servlet-mapping>
  <!-- Connectors servlet -->
  <servlet>
    <servlet-name>v1.ConnectorsServlet</servlet-name>
    <servlet-class>org.apache.sqoop.server.v1.ConnectorServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>v1.ConnectorsServlet</servlet-name>
    <url-pattern>/v1/connectors/*</url-pattern>
  </servlet-mapping>
  <!-- Driver servlet -->
  <servlet>
    <servlet-name>v1.DriverServlet</servlet-name>
    <servlet-class>org.apache.sqoop.server.v1.DriverServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>
  ......
  <!-- Job servlet -->
  <servlet>
    <servlet-name>v1.JobServlet</servlet-name>
    <servlet-class>org.apache.sqoop.server.v1.JobServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>v1.JobServlet</servlet-name>
    <url-pattern>/v1/job/*</url-pattern>
  </servlet-mapping>
  <!-- Jobs servlet -->
  <servlet>
    <servlet-name>v1.JobsServlet</servlet-name>
    <servlet-class>org.apache.sqoop.server.v1.JobsServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>v1.JobsServlet</servlet-name>
    <url-pattern>/v1/jobs/*</url-pattern>
  </servlet-mapping>
  <!-- Submissions servlet -->
  <servlet>
    <servlet-name>v1.SubmissionsServlet</servlet-name>
    <servlet-class>org.apache.sqoop.server.v1.SubmissionsServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>v1.SubmissionsServlet</servlet-name>
    <url-pattern>/v1/submissions/*</url-pattern>
  </servlet-mapping>

</web-app>

 

  • There is an authentication filter to authenticate all request.

 

Code Block
<!-- Filter -->
<filter>
  <filter-name>authFilter</filter-name>
  <filter-class>org.apache.sqoop.filter.SqoopAuthenticationFilter</filter-class>
</filter>

 

  • There are two authentication mode supported: simple and Kerberos, which could be set in the sqoop.properties.

 

Code Block
#
# Authentication configuration
#
org.apache.sqoop.security.authentication.type=SIMPLE
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.Authentication.SimpleAuthenticationHandler
org.apache.sqoop.security.authentication.anonymous=true

#org.apache.sqoop.security.authentication.type=KERBEROS
#org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.Authentication.KerberosAuthenticationHandler
#org.apache.sqoop.security.authentication.kerberos.principal=sqoop/_HOST@NOVALOCAL
#org.apache.sqoop.security.authentication.kerberos.keytab=/home/kerberos/sqoop.keytab
#org.apache.sqoop.security.authentication.kerberos.http.principal=HTTP/_HOST@NOVALOCAL
#org.apache.sqoop.security.authentication.kerberos.http.keytab=/home/kerberos/sqoop.keytab
#org.apache.sqoop.security.authentication.enable.doAs=true
#org.apache.sqoop.security.authentication.proxyuser.#USER#.users=*
#org.apache.sqoop.security.authentication.proxyuser.#USER#.groups=*
#org.apache.sqoop.security.authentication.proxyuser.#USER#.hosts=*

 

Sqoop Request Handlers

 

Each Sqoop Servlet has its corresponding handler class that handles the request for that servlet. It then internally calls the internal sqoop core/ common code.

All the rest APIs supported are documented here: http://sqoop.apache.org/docs/1.99.4/RESTAPI.html#id1

There is also the Sqoop-client that used to invoke the Sqoop-server methods via the jersey REST client,

Code Block
public interface RequestHandler {
  static final String CONNECTOR_NAME_QUERY_PARAM = "cname";
  static final String JOB_NAME_QUERY_PARAM = "jname";
  JsonBean handleEvent(RequestContext ctx);
}
public class ConnectorRequestHandler implements RequestHandler {
...
}

Sqoop Client

  • Sqoop Client is represented by the java class SqoopClient.java
  • It has wrapper ResourceRequest classes for each sqoop entity, they encapsulate the request/postBody parameters to be sent in the request. Refer to Sqoop 2 (1.99.4) Entity Nomenclature and Relationships for more details on the supported Sqoop Entities.
  • It used the bare bones HttpURLConnection object to make requests to the Sqoop-server.

    Code Block
    HttpURLConnection conn = new DelegationTokenAuthenticatedURL().openConnection(url, authToken);
    Note

     SqoopClient used to use jersey REST client for making tomcat requests. Recently it was switched to Hadoop-auth/SPENGO for adding Kerberos support that are documented

herehttps

 

Code Block
/sqoop.sh client

 

  • In Kerberos Authentication mode. Kinit is required to set Kerberos environment.

 

Code Block
kinit sqoop/server-fqdn@HADOOP.COM