Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current stateIn-progressComplete

Discussion thread

JIRAKNOX-1006

...

  • Service discovery type
    An identifier indicating which type of discovery to apply (e.g., Ambari, etc...)
  • Service discovery address
    The associated service registry address
  • Credentials for interacting with the discovery source
  • A provider configuration reference (a unique name, filename, etc...)
    A unique name mapped to a set of provider configurations  (see item #3 from the Motivation section)
  • A list of services to be exposed through Knox (with optional service parameters and URL values)
  • A list of UIs to be proxied by Knox (per KIP-9)

...

Code Block
languagetext
titleProposed YAML
# Discovery info source
discovery-type: AMBARI
discovery-address: http://c6401sandbox.ambarihortonworks.apache.orgcom:8080
discovery-user: ambariusermaria_dev
discovery-pwd-alias: ambari.discovery.password

# Provider config reference, the contents of which will be
# included in (or referenced from) the resulting topology descriptor.
# The contents of this reference has a <gateway/> root, and
# contains <provider/> configurations.
provider-config-ref : ambarisandbox-cluster-policyproviders.xml

# The cluster for which the service details should be discovered
cluster: myclusterSandbox

# The services to declare in the resulting topology descriptor,
# whose URLs will be discovered (unless a value is specified)
services:
    - name: NAMENODE
    - name: JOBTRACKER
    - name: WEBHDFS
    - name: WEBHCAT
    - name: OOZIE
    - name: WEBHBASE
    - name: HIVE
    - name: RESOURCEMANAGER
    - name: AMBARIKNOXSSO
      urlsparams:
        - http://c6401.ambari.apache.org:8080  knoxsso.cookie.secure.only: true
          knoxsso.token.ttl: 100000 
    - name: AMBARIUIAMBARI
      urls:
        - http://c6401sandbox.ambarihortonworks.apache.orgcom:8080

# UIs to be proxied- throughname: theAMBARIUI
 resulting Knox topology    urls:
        - http://sandbox.hortonworks.com:8080

# UIs to be proxied through the resulting Knox topology (see KIP-9)
#uis:
#   - name: AMBARIUI
#     url: http://c6401sandbox.ambarihortonworks.apache.orgcom:8080

 

While JSON is not really a format for configuration, it is certainly appropriate as a wire format, and will be used for API interactions.

Code Block
languagetext
titleProposed JSON
{
  "discovery-type":"AMBARI",
  "discovery-address":"http://c6401sandbox.ambarihortonworks.apache.orgcom:8080",
  "discovery-user":"ambariusermaria_dev",
  "discovery-pwd-alias":"ambari.discovery.password",
  "provider-config-ref":"ambarisandbox-cluster-policyproviders.xml",
  "cluster":"myclusterSandbox",
  "services":[
     {"name":"NAMENODE"},
     {"name":"JOBTRACKER"},
     {"name":"WEBHDFS"},
     {"name":"WEBHCAT"},
     {"name":"OOZIE"},
     {"name":"WEBHBASE"},
     {"name":"HIVE"},
     {"name":"RESOURCEMANAGER"},
     {"name":"AMBARIKNOXSSO",
      "urls":["http://c6401.ambari.apache.org:8080"]}
  ],
  "uis":[params":{
          "knoxsso.cookie.secure.only":"true",
          "knoxsso.token.ttl":"100000"
      }
     },
     {"name":"AMBARIUIAMBARI", "urls":["http://c6401sandbox.ambarihortonworks.apache.orgcom:8080"]}
  ]

 

,
  "uis":[
     {"name":"AMBARIUI", "urls":["http://sandbox.hortonworks.com:8080"]}
  ]
} 

 

Anchor
topologygen
topologygen
Anchortopologygentopologygen3.Topology Generation

Given that we will have a Service Discovery service that can integrate with Ambari as well as other sources of needed metadata, we should be able to start with a simplified topology descriptor.
Once the deployment machinery notices this descriptor, it can pull in the referenced provider configuration, iterate over each of the services, UIs, applications and lookup the details for each.
With the provider configuration and service details we can then generate a fully baked topology.

...

Code Block
languagexml
titleSample Topology File
<?xml version="1.0" encoding="UTF-8"?>
<topology>
    <gateway>
        <provider>
            <role>authentication</role>
            <name>ShiroProvider</name>
            <enabled>true</enabled>
            <param>
                <!--
                session timeout in minutes,  this is really idle timeout,
                defaults to 30mins, if the property value is not defined,,
                current client authentication would expire if client idles contiuosly for more than this value
                -->
                <name>sessionTimeout</name>
                <value>30</value>
            </param>
            <param>
                <name>main.ldapRealm</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
            </param>
            <param>
                <name>main.ldapContextFactory</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory</name>
                <value>$ldapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.userDnTemplate</name>
                <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.url</name>
                <value>ldap://localhost:33389</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                <value>simple</value>
            </param>
            <param>
                <name>urls./**</name>
                <value>authcBasic</value>
            </param>
        </provider>
        <provider>
            <role>identity-assertion</role>
            <name>Default</name>
            <enabled>true</enabled>
        </provider>
        <!--
        Defines rules for mapping host names internal to a Hadoop cluster to externally accessible host names.
        For example, a hadoop service running in AWS may return a response that includes URLs containing the
        some AWS internal host name.  If the client needs to make a subsequent request to the host identified
        in those URLs they need to be mapped to external host names that the client Knox can use to connect.
        If the external hostname and internal host names are same turn of this provider by setting the value of
        enabled parameter as false.
        The name parameter specifies the external host names in a comma separated list.
        The value parameter specifies corresponding internal host names in a comma separated list.
        Note that when you are using Sandbox, the external hostname needs to be localhost, as seen in out
        of box sandbox.xml.  This is because Sandbox uses port mapping to allow clients to connect to the
        Hadoop services using localhost.  In real clusters, external host names would almost never be localhost.
        -->
        <provider>
            <role>hostmap</role>
            <name>static</name>
            <enabled>true</enabled>
            <param><name>localhost</name><value>sandbox,sandbox.hortonworks.com</value></param>
        </provider>
    </gateway>
 
    <service>
        <role>AMBARIUI</role>
        <url>http://c6401.ambari.apache.org:8080</url>
    </service>
    <service>
        <role>HIVE</role>
        <url>http://c6402.ambari.apache.org:10001/cliservice</url>
    </service>
    <service>
        <role>WEBHCAT</role>
        <url>http://c6402.ambari.apache.org:50111/templeton</url>
    </service>
    <service>
        <role>AMBARI</role>
        <url>http://c6401.ambari.apache.org:8080</url>
    </service>
    <service>
        <role>OOZIE</role>
        <url>http://c6402.ambari.apache.org:11000/oozie</url>
    </service>
    <service>
        <role>JOBTRACKER</role>
        <url>rpc://c6402.ambari.apache.org:8050</url>
    </service>
    <service>
        <role>NAMENODE</role>
        <url>hdfs://c6401.ambari.apache.org:8020</url>
    </service>
    <service>
        <role>WEBHBASE</role>
        <url>http://c6401.ambari.apache.org:60080</url>
    </service>
    <service>
        <role>WEBHDFS</role>
        <url>http://c6401.ambari.apache.org:50070/webhdfs</url>
    </service>
    <service>
        <role>RESOURCEMANAGER</role>
        <url>http://c6402.ambari.apache.org:8088/ws</url>
    </service>
</topology>

...

We should also consider how we will discover simple descriptors and I think that we may want to have multiple ways.

3.1.1 Local

...


    <service>
        <role>KNOXSSO</role>
        <param>
            <name>knoxsso.cookie.secure.only</name>
            <value>true</value>
        </param>
        <param>
            <name>knoxsso.token.ttl</name>
            <value>100000</value>
        </param>
    </service>
</topology>


3.1 Simple Descriptor Discovery

We should also consider how we will discover simple descriptors and I think that we may want to have multiple ways.

3.1.1 Local

Just as is currently done for topology files, Knox can monitor a local directory for new or changed descriptors, and trigger topology generation and deployment upon such events.
This is great for development and small cluster deployments.

The Knox Topology Service will monitor two additional directories:

  • conf/shared-providers
    • Referenced provider configurations will go in this directory; These configurations are the <gateway/> elements found in topology files.
    • When a file is modified (create/update) in this directory, any descriptors that reference it are updated to trigger topology regeneration to reflect any provider configuration changes.
    • Attempts to delete a file from this directory via the admin API will be prevented if it is referenced by any descriptors in conf/descriptors.

 

  • conf/descriptors
    • Simple descriptors will go in this directory.
    • When a file is modified (create/update) in this directory, a topology file is (re)generated in the conf/topologies directory.
    • When a file is deleted from this directory, the associated topology file in conf/topologies is also deleted, and that topology is undeployed.
    • When a file is deleted from the conf/topologies directory, the associated descriptor in conf/descriptors is also deleted (if it exists), to prevent unintentional regeneration/redeployment of the topology.

 

3.1.2 Remote

For production and larger deployments, we need to be able to accommodate multiple instances of Knox better. My One proposal for such cases is a ZooKeeper-based discovery mechanism.
Then, all All Knox instances will pick up the changes from ZK as the central source of truth, and perform the necessary generation and deployment of the corresponding topology.

The location of these descriptors and their dependencies (e.g., referenced provider config) must be defined.in ZK must be defined.

It would also be helpful to provide a means (e.g., Ambari, Knox admin UI, CLI, etc...) It would also be helpful to provide a means by which these descriptors can be easily published to the correct location in a znode.

...

Since the service URLs for a cluster will be discovered, Knox has the opportunity to respond dynamically to subsequent topology changes. For a Knox topology that has been generated and deployed, it's possible that the URL for a given service could change at some point afterward.
The host name could change. The scheme and/or port could change (e.g., http --> https). The potential and frequency of such changes certainly varies among deployments.
We should consider providing the option for Knox to detect topology changes for a cluster, and respond by updating its corresponding topology.

For example, Ambari provides the ability to request the active configuration versions for all the service components in a cluster. There could be a thread that checks this set, notices one or more version changes, and initiates the re-generation/deployment of that topology.

Another associated benefit is the capability for Knox to interoperate with Ambari instances that are unaware of the Knox instance. Knox no longer MUST be managed by Ambari.

5. Provider Configurations

...

  1. Provision the alias mapping using the knoxcli.sh script

    bin/knoxcli.sh create-alias ambari.discovery.user --value ambariuser

  2. Specify the discovery-user property in a descriptor (This can be useful if a Knox instance will proxy services in clusters managed by multiple Ambari instances)

    "discovery-user":"ambariuser"

...

  1. Provision the password mapped to the default alias, ambari.discovery.password

    bin/knoxcli.sh create-alias ambari.discovery.password --value ambaripasswd

  2. Provision a different alias, and specify it in the descriptordescriptor (This can be useful if a Knox instance will proxy services in clusters managed by multiple Ambari instances)

    "discovery-pwd-alias":"my.ambari.discovery.password.alias"

 

Related Links