You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Current »

Applies to release: none (trunk) as of 2012-05-18

Overview

Flume 1.x supports writing to Hadoop clusters secured with Kerberos.

Features and limitations:

  • A single agent may have multiple sinks, each of which may write to HDFS as a different user
  • In a multiple-user setup, a single principal must be used, and that principal must be configured to allow impersonation of "proxy" users

Storing as several users in the same agent

In FLUME-1196, support was added for secure impersonation of Hadoop users. This was implemented similar to how Oozie implements secure user impersonation.

There are a few steps to setting up secure impersonation from Flume to Hadoop. The below steps assume you are using Kerberos. However, impersonation still works on non-Kerberos secured clusters, and Kerberos-specific aspects should be omitted in that case.

  1. Hadoop must be configured to allow impersonation.
  2. Set up a Kerberos keytab for the Kerberos principal and host Flume is connecting to HDFS from. This user must match the Hadoop configuration in Step 1 above.
    • Instructions for configuring Hadoop security can be found online which explain how to create a keytab file.
  3. Configure the HDFS sink with the following configuration options:
    • hdfs.kerberosPrincipal - fully-qualified principal. Note: _HOST will be replaced by the hostname of the local machine (only in-between the / and @ characters, though)
    • hdfs.kerberosKeytab - location on the local machine of the keytab containing user and host keys for the above principal
    • hdfs.proxyUser - "proxy" user to impersonate

Example snippet (the majority of the HDFS sink configuration options have been omitted here):

agent.sinks.sink-1.type = hdfs
agent.sinks.sink-1.hdfs.kerberosPrincipal = flume/_HOST@EXAMPLE-REALM.COM
agent.sinks.sink-1.hdfs.kerberosKeytab = /home/mpercy/flume.keytab
agent.sinks.sink-1.hdfs.proxyUser = will

In the above example, the flume user impersonates the user will. This will only be allowed if KDC authenticates the principal, and the Namenode authorizes impersonation of the specified proxy user by the provided principal.

Directly accessing Hadoop as the Kerberos principal

If only one user is needed, then the proxyUser configuration option may be omitted. In this case, the user indicated by the Kerberos principal is used to access Hadoop directly.

  • No labels