Release: none (trunk) as of 2012-05-18
Flume 1.x supports securely communicating with Hadoop using Kerberos.
In FLUME-1196, support was added for secure impersonation of Hadoop users. This was implemented similar to how Oozie implements secure user impersonation.
There are a few steps to setting up secure impersonation from Flume to Hadoop. The below steps assume you are using Kerberos. However, impersonation still works on non-Kerberos secured clusters, and Kerberos-specific aspects should be omitted in that case.
- Hadoop must be configured to allow impersonation.
- For setting up impersonation in Hadoop, please see the Hadoop documentation for Secure Impersonation.
- Set up a Kerberos keytab for the Kerberos principal and host Flume is connecting to HDFS from. This user must match the Hadoop configuration in Step 1 above.
- Instructions for configuring Hadoop security can be found online which explain how to create a keytab file.
- Configure the HDFS sink with the following configuration options:
- hdfs.kerberosPrincipal - fully-qualified principal. Note: _HOST will be replaced by the hostname of the local machine (only in-between the / and @ characters, though)
- hdfs.kerberosKeytab - location on the local machine of the keytab containing user and host keys for the above principal
- hdfs.proxyUser - "proxy" user to impersonate
Example snippet (the majority of the HDFS sink configuration options have been omitted here):
agent.sinks.sink-1.type = hdfs agent.sinks.sink-1.hdfs.kerberosPrincipal = flume/_HOST@EXAMPLE-REALM.COM agent.sinks.sink-1.hdfs.kerberosKeytab = /home/mpercy/flume.keytab agent.sinks.sink-1.hdfs.proxyUser = will
In the above example, the flume
user impersonates the user will
. This will only be allowed if KDC authenticates the principal, and the Namenode authorizes impersonation of the specified proxy user by the provided principal.