Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Apache GitHub Pull Request

https://github.com/apache/knox/pull/365

Performance test tool design

The general goal is to have an extendable performance test framework that will drive the Knox Gateway configured by different use cases.

Some initial requirements of this framework:

  • easy to extend with new use-cases
  • has the ability to execute long-running jobs
  • generate meaningful and easy-to-read reports of those metrics automatically (in configurable time periods)

In the first phase, the task was to conduct and codify performance testing for scalability and performance benchmarking with concurrent clients, with long-running jobs that stress the backend of the token state server store:

  • use different token state server implementations
  • turn on/off the token state service mechanism
  • use concurrent clients who dealing with tokens (make sure they actually use the tokens and periodically renew them)

The tool itself is fed by its own configuration file located in $YOUR_KNOX_PROJECT_ROOT/gateway-performance-test/src/test/resources/performance.test.configuration.properties:

Code Block
# Gateway connection related properties
perf.test.gateway.url.protocol=https
perf.test.gateway.url.host=localhost
perf.test.gateway.url.port=8443
perf.test.gateway.jmx.port=8888

# report generation related properties
perf.test.report.generation.periodInSecs=30
perf.test.report.generation.json.enabled=true
perf.test.report.generation.yaml.enabled=true

# Knox Token use case related properties
perf.test.usecase.knoxtoken.enabled=true
perf.test.usecase.knoxtoken.topology.gateway=sandbox
perf.test.usecase.knoxtoken.topology.tokenbased=tokenbased
perf.test.usecase.knoxtoken.numOfThreads=3
perf.test.usecase.knoxtoken.testDurationInSecs=60
perf.test.usecase.knoxtoken.requestDelayLowerBoundInSecs=5
perf.test.usecase.knoxtoken.requestDelayUpperBoundInSecs=10

As of today (17 Aug 2020), there is only one use-case implementation exists to address the above-written acquire/renew/use Knox Delegation token case. The related resources (Java classes, properties files) are located in the gateway-performance-test Maven module. Here is the list of the most relevant resources:

  • src/main/java/org.apache.knox.gateway.performance.test

    • PerformanceTestRunner - this is the entry point of the tool. This class comes with a main method that reads the given configuration file and executes all enabled use-case runners
    • ResponseTimeCache - this class acts as a holder of response times and shared between the worker threads (which write into the cache) and the report generation threads (reading data from it)
    • reporting.GatewayMetricsReporter - this class generates the human-readable reports in JSON and YAML format in a fixed schedule marked by perf.test.report.generation.periodInSecs 
    • knoxtoken.KnoxTokenUseCaseRunner - this class is responsible for
      • start N worker threads that are acquiring Knox DTs parallel (marked by perf.test.usecase.knoxtoken.numOfThreads)
      • and 2 more threads to
        • renew an already acquired Knox DT
        • do an HDFS ls command using an already acquired Knox DT
    • knoxtoken.KnoxTokenWorkerThread - this represents the job that actually acquires/renews/uses Knox DTs. The renew/use actions are running only on 1-1 thread and they wait 2 times more time between two subsequential calls then simply executing the acquire action. In other words, by default, a worker thread acquires a Knox DT between every 5 to 10 seconds (on N threads) whereas a worker thread which renews/uses a previously acquired Knox token waits between 10-20 seconds.

    • knoxtoken.KnoxTokenCache - stores the already acquired Knox DTs (if the number of DTs reaches 500 the cache is cleaned automatically)
  • src/test/resources
    • performance.test.configuration.properties - contains the above-described configuration file
    • performanceTest-log4j.properties - the Log4j configuration of the tool. By default, it prints log messages on the STDOUT as well as writes them into target/logs/performanceTest.log

Knox gateway requisites

The performance test tool tries to connect to a Knox instance to

  • acquire/renew/use Knox delegation tokens using its token API (/knoxtoken/api/v1/token)
  • fetch useful metrics via JMX

To be able to execute the last item, before running the Knox gateway you are testing against, the following configuration should be done:

  1. Set KNOX_GATEWAY_DBG_OPTS environment variable as follows: 
Code Block
export KNOX_GATEWAY_DBG_OPTS="$KNOX_GATEWAY_DBG_OPTS -Dcom.sun.management.jmxremote.port=8888 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"

      2. Enable JMX reporting in gateway-site.xml:

Code Block
    <property>
        <name>gateway.metrics.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>gateway.jmx.metrics.reporting.enabled</name>
        <value>true</value>
    </property>

Once you opened the necessary JMX port, you also need to make sure you have at least one topology with the KNOXTOKEN service. During my tests, I extended the sandbox topology with the following service configuration:

Code Block
   <service>
      <role>KNOXTOKEN</role>
      <param>
         <name>knox.token.ttl</name>
         <value>36000000</value>
      </param>
      <param>
         <name>knox.token.audiences</name>
         <value>tokenbased</value>
      </param>
      <param>
         <name>knox.token.target.url</name>
         <value>https://localhost:8443/gateway/tokenbased</value>
      </param>
      <param>
         <name>knox.token.exp.server-managed</name>
         <value>true</value>
      </param>
      <param>
         <name>knox.token.renewer.whitelist</name>
         <value>guest</value>
      </param>
   </service>

If you plan to create a new topology for this purpose, please change the perf.test.usecase.knoxtoken.topology.gateway configuration accordingly.

As you can see, the newly added service references another topology called tokenbased. As its name suggests, that particular topology uses JWT authentication and is configured as follows:

Code Block
<?xml version="1.0" encoding="UTF-8"?>
<topology>
   <name>tokenbased</name>
   <gateway>
      <provider>
         <role>federation</role>
         <name>JWTProvider</name>
         <enabled>true</enabled>
         <param>
            <name>knox.token.audiences</name>
            <value>tokenbased</value>
         </param>
      </provider>
   </gateway>
   <service>
      <role>WEBHDFS</role>
      <url>http://YOUR_HDFS_SERVICE_HOST:20101/webhdfs</url>
   </service>
</topology>

Since the 'KnoxToken Use Case' tries to use an already acquired Knox DT to run an action I chose to do this as simple as possible: using KnoxShell's class we issue an ls command with a KnoxShell session that uses a Knox DT. It is very important that the tokenbased topology comes with the WEBHDFS service for this purpose.

If you plan to have this topology with a different name (or you already have one that uses JWT and has WEBHDFS), please update the perf.test.usecase.knoxtoken.topology.tokenbased configuration accordingly.

How to run

Running the performance tool is as simple as running the following Maven command in the project root:


Code Block
mvn -DskipTests -Dcheckstyle.skip=true -Dfindbugs.skip=true -Dpmd.skip=true -Drat.skip -Pgateway-performance-test package -am -pl gateway-performance-test


The tool will pick up the above-mentioned configuration file and execute all enabled use-case runners (currently there is only one implementation). You can make the desired changes in that properties file before executing your performance test rounds as your requirements needs. For instance, increasing the number of parallel threads to 10 and the test duration to 6 hours you need to update

Code Block
perf.test.usecase.knoxtoken.numOfThreads=10
perf.test.usecase.knoxtoken.testDurationInSecs=21600

Test results

The JSON/YAML test results are generated under target/testResults/[json|yaml]:

  • heapGauges.YYYY-MM-DD.[json|yaml]. Sample:

    Code Block
    ---
    metrics:name=heap.init,type=gauges:
      Number: 268435456
      Value: 268435456
    metrics:name=heap.usage,type=gauges:
      Number: 0.04556474316352607
      Value: 0.04556474316352607
    metrics:name=heap.committed,type=gauges:
      Number: 537919488
      Value: 537919488
    metrics:name=heap.max,type=gauges:
      Number: 3817865216
      Value: 3817865216
    metrics:name=heap.used,type=gauges:
      Number: 174097040
      Value: 174094952


  • responseTimes.YYYY-MM-DD.[json|yaml]. Sample:

    Code Block
    ---
    acquireResponseTimes:
      _data:
      - 204
      - 204
      - 204
      - 18
      - 19
      - 21
      - 19
      - 16
      - 13
      - 11
      - 10
      - 14
      - 13
      - 11
      - 12
      - 11
      - 14
      - 10
      mode:
      - 11.0
      - 204.0
      min: 10.0
      max: 204.0
      mean: 45.777777777777786
      geometricMean: 21.52096561302238
    renewResponseTimes:
      _data:
      - 449
      mode:
      - 449.0
      min: 449.0
      max: 449.0
      mean: 449.0
      geometricMean: 449.0000000000001
    
    


  • timers.YYYY-MM-DD.[json|yaml]. Sample:

    Code Block
    ---
    metrics:name=client./gateway/sandbox/knoxtoken/api/.POST-requests,type=timers:
      Mean: 402.24274066722455
      StdDev: 91.5058294023066
      "75thPercentile": 418.193352
      "98thPercentile": 628.076518
      RateUnit: "events/second"
      "95thPercentile": 628.076518
      "99thPercentile": 628.076518
      Max: 628.076518
      Count: 9
      FiveMinuteRate: 0.00943518979245522
      "50thPercentile": 404.177556
      MeanRate: 0.0036809848722434983
      Min: 115.866694
      OneMinuteRate: 0.01908228359419756
      DurationUnit: "milliseconds"
      "999thPercentile": 628.076518
      FifteenMinuteRate: 0.017525836234274554
    metrics:name=client./gateway/sandbox/knoxtoken/api/.GET-requests,type=timers:
      Mean: 11.77009795297069
      StdDev: 8.295945542138911
      "75thPercentile": 12.517118
      "98thPercentile": 39.362919999999995
      RateUnit: "events/second"
      "95thPercentile": 31.40917
      "99thPercentile": 39.362919999999995
      Max: 1183.7718479999999
      Count: 77
      FiveMinuteRate: 0.11133648462048056
      "50thPercentile": 8.21257
      MeanRate: 0.03134296740421139
      Min: 6.442307
      OneMinuteRate: 0.2513294281670072
      DurationUnit: "milliseconds"
      "999thPercentile": 39.362919999999995
      FifteenMinuteRate: 0.12650178822825908
    metrics:name=client./gateway/tokenbased/webhdfs/v1.GET-requests,type=timers:
      Mean: 529.0935038293519
      StdDev: 37.26100859914753
      "75thPercentile": 540.324314
      "98thPercentile": 682.462197
      RateUnit: "events/second"
      "95thPercentile": 540.324314
      "99thPercentile": 682.462197
      Max: 4007.184354
      Count: 8
      FiveMinuteRate: 0.012584932473001562
      "50thPercentile": 520.5765289999999
      MeanRate: 0.003278659499845765
      Min: 462.182908
      OneMinuteRate: 0.033547665513349575
      DurationUnit: "milliseconds"
      "999thPercentile": 682.462197
      FifteenMinuteRate: 0.018536358285136553


  • tokenStateStatistics.YYYY-MM-DD.[json|yaml]. Sample:

    Code Block
    ---
    metrics:name=TokenStateService,type=Statistics:
      KeystoreInteractions:
        removeAlias: 11
        saveAlias: 25
        getAlias: 41
      GatewayCredentialsFileSize: 89299
      NumberOfTokensAdded: 89
      NumberOfTokensRenewed: 11