Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
outlinetrue

 


Rewrite rules in Apache Knox can be difficult to follow if you are just starting to use Apache Knox, this blog tries to cover the basics of Apache Knox rewrite rules and then go in depth and talk about more advanced rules and how to use them. This blog builds upon the Adding a service to Apache Knox by Kevin Minder

...

Code Block
<service role="WEATHER" name="weather" version="0.0.1">
  <routes>
    <route path="/weather/**"/>
  </routes>
</service>

...


service.xml file defines the high level URL pattern that will be exposed by the gateway for a service.

Code Block
<service role="WEATHER">
  • The role/implementation/version triad is used through Knox for integration plugins.
  • Think of the role as an interface in Java.
  • This attribute declares what role this service “implements”.
  • This will need to match the topology file’s <topology><service><role> for this service.

Code Block
<service name="weather">
  • In the role/implementation/version triad this is the implementation.
  • Think of this as a Java implementation class name relative to an interface.
  • As a matter of convention this should match the directory beneath <GATEWAY_HOME>/data/services
  • The topology file can optionally contain <topology><service><name> but usually doesn’t. This would be used to select a specific implementation of a role if there were multiple.

Code Block
<service version="0.0.1">
  • As a matter of convention this should match the directory beneath the service implementation name.
  • The topology file can optionally contain <topology><service><version> but usually doesn’t. This would be used to select a specific version of an implementation there were multiple. This can be important if the protocols for a service evolve over time.

 


Code Block
<service><routes><route path="/weather/**"></routes></service>
  • This tells the gateway that all requests starting starting with /weather/ are handled by this service.
  • Due to a limitation this will not include requests to /weather (i.e. no trailing /)
  • The ** means zero or more paths similar to Ant.
  • The scheme, host, port, gateway and topology components are not included (e.g. https://localhost:8443/gateway/sandbox)
  • Routes can, but typically don’t, take query parameters into account.
  • In this simple form there is no direct relationship between the route path and the rewrite rules!

Simple rewrite rules

...

 



Code Block
<rules><rule pattern="*://*:*/**/weather/{path=**}?{**}"/></rules>
  • Defines the URL pattern for which this rule will apply.
  • The * matches exactly one segment of the URL.
  • The ** matches zero or more segments of the URL.
  • The {path=**} matches zero or more path segments and provides access them as a parameter named 'path’.
  • The {**} matches zero or more query parameters and provides access to them by name.
  • The values from matched {…} segments are “consumed” by the rewrite template below.

Code Block
<rules><rule><rewrite template="{$serviceUrl[WEATHER]}/{path=**}?{**}"/></rules>
  • Defines how the URL matched by the rule will be rewritten.
  • The $serviceUrl[WEATHER]} looks up the <service><url> for the <service><role>WEATHER. This is a implemented as rewrite function and is another custom extension point.
  • The {path=**} extracts zero or more values for the 'path’ parameter from the matched URL.
  • The {**} extracts any “unused” parameters and uses them as query parameters.

Scope

Rewrites rules can be global and local to the service they are defined in. After Apache Knox 0.6.0 all the rewrites rules are local unless they are explicitly defined as global.

To define global rules use the property 'gateway.global.rules.services' in 'gateway-site.xml' that takes a list of services whose rewrite rules are made global. for. e.g.

Code Block
    <property>
        <name>gateway.global.rules.services</name>
        <value>"NAMENODE","JOBTRACKER", "WEBHDFS", "WEBHCAT", "OOZIE", "WEBHBASE", "HIVE", "RESOURCEMANAGER"</value>
    </property>

Note: Rewrite rules rules for these services "NAMENODE","JOBTRACKER", "WEBHDFS", "WEBHCAT", "OOZIE", "WEBHBASE", "HIVE", "RESOURCEMANAGER" are global by default.

If you want to define a single rule to be scoped inside a global rewrite rules you can do so by using the attribute 'scope' e.g.

Code Block
    <!-- Limit the scope of this rule just to WEBHDFS service -->
    <rule dir="OUT" scope="WEBHDFS" name="WEBHDFS/webhdfs/outbound" pattern="hdfs://*:*/{path=**}?{**}">
        <rewrite template="{$frontend[url]}/webhdfs/v1/{path=**}?{**}"/>
    </rule>


Direction

Rewrite rules can be applied to inbound (requests going to the Gateway - from browser, curl etc.) or outbound (response going from the Gateway towards browser) requests/responses. The direction is indicated by the "dir" attribute

Code Block
<rule dir="IN">

The possible values are IN and OUT for inbound and outbound requests.


Flow

Flows are the logical AND, OR, ALL operators on the rules. So, a rewrite rule could match a pattern A OR pattern B, a rule could match a pattern A AND pattern B, a rule could match ALL the given patterns.

Valid flow values are:

  • OR
  • AND
  • ALL

e.g. OR (match )

Code Block
<rule name="test-rule-with-complex-flow" flow="OR">
    <match pattern="*://*:*/~/{path=**}?{**}">
        <rewrite template="test-scheme-output://test-host-output:777/test-path-output/test-home/{path}?{**}"/>
    </match>
    <match pattern="*://*:*/{path=**}?{**}">
        <rewrite template="test-scheme-output://test-host-output:42/test-path-output/{path}?{**}"/>
    </match>
</rule>


Rewrite Variables

These variables can be used with the rewrite function.

$username

Username of authenticated user

Code Block
	<rule name="OOZIE/oozie/user-name">
        <rewrite template="{$username}"/>
    </rule>


$inboundurl



Code Block
  <rule dir="OUT" name="NODEUI/node/static" pattern="/static/{**}">
    <rewrite template="{$frontend[url]}/node/static/{**}?host={$inboundurl[host]}"/>
  </rule>


$serviceAddr


Code Block
    <rule name="hdfs-addr">
        <rewrite template="hdfs://{$serviceAddr[NAMENODE]}"/>
    </rule>


$serviceHost


Code Block
    <rule name="nn-host">
        <rewrite template="{$serviceHost[NAMENODE]}"/>
    </rule>


$serviceMappedAddr


Code Block
    <rule name="OOZIE/oozie/name-node-url">
        <rewrite template="hdfs://{$serviceMappedAddr[NAMENODE]}"/>
    </rule>


$serviceMappedHost


Code Block


$serviceMappedUrl


Code Block
    <match pattern="{path=**}">
            <rewrite template="{$serviceMappedUrl[NAMENODE]}/{path=**}"/>
    </match>


$servicePath


Code Block
    <rule name="nn-path">
        <rewrite template="{$servicePath[NAMENODE]}"/>
    </rule>


$servicePort


Code Block
    <rule name="hdfs-path">
        <match pattern="{path=**}"/>
        <rewrite template="hdfs://{$serviceHost[NAMENODE]}:{$servicePort[NAMENODE]}/{path=**}"/>
    </rule>


$serviceScheme


Code Block
<rule dir="IN" name="NODEUI/logs" pattern="*://*:*/**/node/logs/?{host}?{port}">
    <rewrite template="{$serviceScheme[NODEUI]}://{host}:{port}/logs/"/>
</rule>

$serviceUrl

  • $serviceUrl[SERVICE_NAME]  - looks up the <service><url> for the <service><role>SERVICE_NAME

...

  • $import - This function enhances the $frontend function by adding '@import' prefix to the $frontend path. e.g.

    Code Block
    <rewrite template="{$import[&quot;, url]}/stylesheets/pretty.css&quot;;"/>

    . It takes following parameters as options:

$username

  • $username - This variable is used when we need to get the impersonated principal name (primary principal in case impersonated principal is absent).

    Code Block
    <rewrite template="test-output-scheme://{host}:{port}/test-output-path/{path=**}?user.name={$username}?{**}?test-query-output-name=test-query-output-value"/>


$prefix

  • $prefix - This function enhances the $frontend function just like $import but gives the ability to choose a prefix (unlike a constant @import in case of $import) e.g.

    Code Block
    <rewrite template="{$prefix[&#39;,url]}/zeppelin/components/{**}?{**}"/>
    
    
    • $prefix[PREFIX, url] - Adds a supplied PREFIX to the frontend url, e.g. in above case the rewritten url would be 'https://localhost:8443/

      zeppelin/components/navbar/navbar.html?v=1498928142479' (mind the single tick ' )

$postfix

  • $postfix - Just like prefix, postfix function is used to append a character or string to the gateway url (including topology path)

  • usage - {$postfix[url,<customString>]}

    Code Block
    <rewrite template="{scheme}://{host}:{port}/{gateway}/{knoxsso}/{api}/{v1}/{websso}?originalUrl={$postfix[url,/sparkhistory/]}"/>



$infix

  • $infix - This function is used to used to append custom prefix and postfix
  • usage - {$infix[<customString>,url,<customString>]}

    Code Block
    <rewrite template="{scheme}://{host}:{port}/{gateway}/{sandbox}/?query={$infix[&#39;,url,/sparkhistory/&#39;]}"/>


$hostmap

The purpose of the Hostmap provider is to handle situations where host are known by one name within the cluster and another name externally. This frequently occurs when virtual machines are used and in particular when using cloud hosting services. Currently, the Hostmap provider is configured as part of the topology file.

For more information see knox user guide


Rewrite rule example:


Code Block
  <rewrite template="{gateway.url}/hdfs/logs?{scheme}?host={$hostmap(host)}?{port}?{**}"/>

Topology declaration example

Code Block
<topology>
    <gateway>
        ...
        <provider>
            <role>hostmap</role>
            <name>static</name>
            <enabled>true</enabled>
            <param><name>external-host-name</name><value>internal-host-name</value></param>
        </provider>
        ...
    </gateway>
    ...
</topology>


$inboundurl

Only used by outbound rules

Code Block
<rewrite template="{gateway.url}/datanode/static/{**}?host={$inboundurl[host]}"/>


Rules Filter

Sometimes you want the ability to rewrite the *.js, *.css and other non-html pages. FIlters are a way to rewrite these non-html files. FIlters are based on the content-type of the page.

These are the different types of filters that are supported by Apache Knox.

There are three declarations needed for filters, 

  1. Filter declaration, the Content-Type and the pattern to apply the filter to - rewrite.xml
  2. Rewrite rule to apply to matched patter - rewrite.xml
  3. Path to apply the filter to and to be applied on response or request body - service.xml

The is an example of Filters used in Proxying Zeppelin UI, the relevant code snippets in rewrite.xml and service.xml files are:

Code Block
titlerewrite.xml
  <!-- Filters -->
  <rule dir="OUT" name="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/home" >
    <rewrite template="{$frontend[path]}/zeppelin/app/home/home.html"/>
  </rule>
  
  <rule dir="OUT" name="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/notebook" >
    <rewrite template="{$frontend[path]}/zeppelin/app/notebook/notebook.html"/>
  </rule>
  
  <rule dir="OUT" name="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/jobmanager" >
    <rewrite template="{$frontend[path]}/zeppelin/app/jobmanager/jobmanager.html"/>
  </rule>
 
  <filter name="ZEPPELINUI/zeppelin/outbound/javascript/filter">
          <content type="application/javascript">
              <apply path="app/home/home.html" rule="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/home"/>
              <apply path="app/notebook/notebook.html" rule="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/notebook"/>
              <apply path="app/jobmanager/jobmanager.html" rule="ZEPPELINUI/zeppelin/outbound/javascript/filter/app/jobmanager"/>
          </content>
  </filter>


Code Block
titleservice.xml
    <!-- Filter -->
    <route path="/zeppelin/scripts/**">
      <rewrite apply="ZEPPELINUI/zeppelin/outbound/javascript/filter" to="response.body"/>
    </route>

A good example of how to use the filters is Proxying a UI using Knox.

Following are the different types of Content-Types supported by Apache Knox.

Form URL Rewrite Filter

Uses Content-Type "application/x-www-form-urlencoded", "*/x-www-form-urlencoded"

HTML URL Rewrite Filter

Uses Content-Type "application/html", "text/html", "*/html"

JavaScript URL Rewrite Filter

Uses Content-Type "application/javascript", "text/javascript", "*/javascript", "application/x-javascript", "text/x-javascript", "*/x-javascript"

JSON URL Rewrite FIlter

Uses Content-Type "application/json", "text/json", "*/json"

XML URL Rewrite FIlter

Uses Content-Type "application/xml", "text/xml", "*/xml"


Pattern Matching

Pattern matching for Knox unfortunately does not match the standard Regex format. Following is how pattern matching works in some of the cases

...

      query     = $7

      fragment  = $9

 


JSON Parsing

For parsing JSON documents Knox uses JSONPATH

...

http://www.ics.uci.edu/pub/ietf/uri/#Related