Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

JIRA : SQOOP-1525 and its sub tickets

Security Guide On Sqoop 2
 

Most Hadoop components, such as HDFS, Yarn, Hive, etc., have security frameworks, which support Simple, Kerberos and LDAP authentication. currently Sqoop 2 provides 2 types of authentication: simple and kerberos. The authentication module is pluggable, so more authentication types can be added.

Simple Authentication

Configuration

Modify Sqoop configuration file, normally in <Sqoop Folder>/server/config/sqoop.properties.

...


  • Simple authentication is used by default. Commenting out authentication configuration will yield the use of simple authentication.

Run command

Start Sqoop server as usual.

...

Code Block
<Sqoop Folder>/bin/sqoop.sh client


Kerberos Authentication

Kerberos is a computer network authentication protocol which works on the basis of 'tickets' to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner. Its designers aimed it primarily at a client–server model and it provides mutual authentication—both the user and the server verify each other's identity. Kerberos protocol messages are protected against eavesdropping and replay attacks.

Dependency

Set up a KDC server. Skip this step if KDC server exists. It's difficult to cover every way Kerberos can be setup (ie: there are cross realm setups and multi-trust environments). This section will describe how to setup the sqoop principals with a local deployment of MIT kerberos.

...

  • The <FQDN> should be replaced by the FQDN of the server, which could be found via “hostname -f” in command line.
  • The <REALM> should be replaced by the realm name in krb5.conf file generated when installing the KDC server in the former step.
  • The principal HTTP/<FQDN>@<REALM> is used in communication between Sqoop client and Sqoop server. Since Sqoop server is an http server, so the HTTP principal is a must during SPNEGO process, and it is case sensitive.
  • Http request could be sent from other client like browser, wget or curl with SPNEGO support.
  • The principal sqoop/<FQDN>@<REALM> is used in communication between Sqoop server and Hdfs/Yarn as the credential of Sqoop server.

Configuration

Modify Sqoop configuration file, normally in <Sqoop Folder>/server/config/sqoop.properties.

...

  • When _HOST is used as FQDN in principal, it will be replaced by the real FQDN. https://issues.apache.org/jira/browse/HADOOP-6632
  • If parameter proxyuser is set true, Sqoop server will use proxy user mode (sqoop delegate real client user) to run Yarn job. If false, Sqoop server will use sqoop user to run Yarn job.

Run command

Set SQOOP2_HOST to FQDN.

Code Block
export SQOOP2_HOST=$(hostname -f)

...

Code Block
<Sqoop Folder>/bin/sqoop.sh client

Verify

If the Sqoop server has started successfully with Kerberos authentication, the following line will be in <@LOGDIR>/sqoop.log:

...

Code Block
Refreshing Kerberos configuration
Acquire TGT from Cache
Principal is HTTP/<FQDN>@HADOOP.COM
null credentials from Ticket Cache
principal is HTTP/<FQDN>@HADOOP.COM
Will use keytab
Commit Succeeded


Customized Authentication


Users can create their own authentication modules. By performing the following steps:

...

  • Modify configuration org.apache.sqoop.authentication.handler in <Sqoop Folder>/server/config/sqoop.properties and set it to the customized authentication handler class name.
  • Restart the Sqoop server.

High level design

 

SQOOP-1525

Design Details

https://issues.apache.org/jira/secure/attachment/12671414/SQOOP-1525.pdf

See comments in JIRA and sub tickets for details and more updated info