Blog from June, 2017

Hadoop Group Lookup Provider

Introduction

Prior to the addition of the Hadoop Group Lookup Provider, group lookup was relegated to the authentication or federation provider that established the user identity.

Therefore, there was a limitation to which group lookup mechanisms were available.

As part of KIP-1 improvements and release 0.10.0, the Knox community as introduced an identity assertion provider that integrates the Hadoop Groups Mapping capability from Hadoop common.

This allows us to compose topologies that contain any authentication/federation provider as well as the Hadoop Group Lookup Provider as an identity assertion provider.

Eliminating the previous limitation of choices and enabling the same exact capabilities for group mapping that are being leveraged throughout the cluster.

This results in greater flexibility, consistency and choices for performance and complex lookup approaches.

Hadoop Group Lookup Provider

An identity assertion provider that looks up user’s ‘group membership’ for authenticated users using Hadoop’s group mapping service (GroupMappingServiceProvider).

This allows existing investments in the Hadoop mechanism to be leveraged within Knox and used within the access control policy enforcement at the perimeter.

The ‘role’ for this provider is ‘identity-assertion’ and name is ‘HadoopGroupProvider’.

    <provider>
        <role>identity-assertion</role>
        <name>HadoopGroupProvider</name>
        <enabled>true</enabled>
        <<param> ... </param>
    </provider>

Configuration

All the configuration for ‘HadoopGroupProvider’ resides in the provider section in a gateway topology file. The ‘hadoop.security.group.mapping’ property determines the implementation. Some of the valid implementation are as follows

org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback

This is the default implementation and will be picked up if ‘hadoop.security.group.mapping’ is not specified. This implementation will determine if the Java Native Interface (JNI) is available. If JNI is available, the implementation will use the API within Hadoop to resolve a list of groups for a user. If JNI is not available then the shell implementation, org.apache.hadoop.security.ShellBasedUnixGroupsMapping, is used, which shells out with the ‘bash -c groups’ command (for a Linux/Unix environment) or the ‘net group’ command (for a Windows environment) to resolve a list of groups for a user.

org.apache.hadoop.security.LdapGroupsMapping

This implementation connects directly to an LDAP server to resolve the list of groups. However, this should only be used if the required groups reside exclusively in LDAP, and are not materialized on the Unix servers.

For more information on the implementation and properties refer to Hadoop Group Mapping.

Example

The following example snippet works with the demo ldap server that ships with Apache Knox. Replace the existing ‘Default’ identity-assertion provider with the one below (HadoopGroupProvider).

    <provider>
        <role>identity-assertion</role>
        <name>HadoopGroupProvider</name>
        <enabled>true</enabled>
        <param>
            <name>hadoop.security.group.mapping</name>
            <value>org.apache.hadoop.security.LdapGroupsMapping</value>
        </param>
        <param>
            <name>hadoop.security.group.mapping.ldap.bind.user</name>
            <value>uid=tom,ou=people,dc=hadoop,dc=apache,dc=org</value>
        </param>
        <param>
            <name>hadoop.security.group.mapping.ldap.bind.password</name>
            <value>tom-password</value>
        </param>
        <param>
            <name>hadoop.security.group.mapping.ldap.url</name>
            <value>ldap://localhost:33389</value>
        </param>
        <param>
            <name>hadoop.security.group.mapping.ldap.base</name>
            <value></value>
        </param>
        <param>
            <name>hadoop.security.group.mapping.ldap.search.filter.user</name>
            <value>(&amp;(|(objectclass=person)(objectclass=applicationProcess))(cn={0}))</value>
        </param>
        <param>
            <name>hadoop.security.group.mapping.ldap.search.filter.group</name>
            <value>(objectclass=groupOfNames)</value>
        </param>
        <param>
            <name>hadoop.security.group.mapping.ldap.search.attr.member</name>
            <value>member</value>
        </param>
        <param>
            <name>hadoop.security.group.mapping.ldap.search.attr.group.name</name>
            <value>cn</value>
        </param>
    </provider>

Here, we are working with the demo ldap server running at ‘ldap://localhost:33389’ which populates some dummy users for testing that we will use in this example. This example uses the user ‘tom’ for LDAP binding. If you have different LDAP/AD settings you will have to update the properties accordingly.

Let’s test our setup using the following command (assuming the gateway is started and listening on localhost:8443). Note that we are using credentials for the user ‘sam’ along with the command.

    curl -i -k -u sam:sam-password -X GET 'https://localhost:8443/gateway/sandbox/webhdfs/v1/?op=LISTSTATUS' 

The command should be executed successfully and you should see the groups ‘scientist’ and ‘analyst’ to which user ‘sam’ belongs to in gateway-audit.log i.e.

    ||a99aa0ab-fc06-48f2-8df3-36e6fe37c230|audit|WEBHDFS|sam|||identity-mapping|principal|sam|success|Groups: [scientist, analyst]