Blog from January, 2019

Topology Based Federation


Jira: KNOX-1247

KIP: KIP-11 Cloud Usecases

Introduction

This feature which ships with Knox 1.2.0 allows federation from one Knox instance to another. This is done by using  Header Based Pre-Auth authentication provider. Typically this is useful in cases where one has a hybrid, on-prem - cloud model, so that on-prem Knox instance can federate requests to the cloud instance which can be useful in cases like:

  • WebHDFS calls to on-prem Knox instance get re-dispatched to the cloud instance/s and results in files being put to or read from HDFS in the cloud.
  • Spark jobs submitted to Livy through on-prem instances get re-dispatched and are submitted as cloud workloads.
  • MapReduce jobs submitted to YARN RM through Knox will be submitted as workloads to the cloud.

The downside of this approach is that it adds an additional hop to the request which can slow things down in some cases. It is also extremely critical to make sure "two-way-ssl" is enabled and trust is properly established between on-prem and cloud Knox instances by provisioning certificates, since Header Based Pre Auth authentication by itself is not secure, perimeter security around cloud Knox instance is a must, e.g. VPC, ip whitelisting of the on-prem Knox instance/s ip address.

Setup

The following diagram describes the federated request flow.


We need to provision certificate for cloud Knox instance into on-prem Knox instance and vice versa and enable two-way-ssl, as shown below.

We will look at the setting for on-prem and cloud Knox topologies

On-prem

Authentication provider (topology): any

Dispatch (topology): org.apache.knox.gateway.dispatch.HeaderPreAuthFederationDispatch

Federation header name (gateway-site.xml): gateway.custom.federation.header.name

For authentication, since we will be authenticating locally, we can use any authentication provider we choose i.e.  local LDAP.

We can update the dispatch for the service that needs to be federated (WEBHDFS in the following example). You can override dispatch for a service in the topology itself e.g.

	  <service>
          <role>WEBHDFS</role>
          <url>https://my.cloudurl.com:8443/gateway/aws/webhdfs</url>
		  <dispatch>
             <classname>org.apache.knox.gateway.dispatch.HeaderPreAuthFederationDispatch</classname>
             <use-two-way-ssl>true</use-two-way-ssl>
          </dispatch>
      </service> 

gateway.custom.federation.header.name property in gateway-site.xml can be used to set a custom header name. Default value of this property is "SM_USER". 

This property value needs to be same as preauth.custom.header property used by Cloud topology HeaderPreAuth authentication provider.

e.g.

    <property>
        <name>gateway.custom.federation.header.name</name>
        <value>aws_header</value>
        <description>Custom header name to be used for federated requests.</description>
    </property>


Cloud

Authentication provider (topology): HeaderPreAuth

For cloud Knox instance, we need to use the HeaderPreAuth authentication provider and specify the "preauth.custom.header " parameter, "preauth.custom.header" should be exactly same as value of property "gateway.custom.federation.header.name" defined in on-prem gateway-site.xml (aws_header in our example above)


Following is the relevant topology snippet

     <provider>          
         <role>federation</role> 
         <name>HeaderPreAuth</name>          
         <enabled>true</enabled>         
         <param>
              <name>preauth.custom.header</name>
              <value>aws_header</value>
           </param>          
     </provider>

That's all there is, now your topology based federation should be ready.