1.  Backgorund

1.1 Problems

    As IoTDB becomes more and more complex, some system operating metric need to be monitored to improve the operational viability and robustness of the system. IoTDB does not have a relatively complete metric collector to support the collection of some system operating metric now, so it is necessary to design a set of metric acquisition system.

    There are some widely used metric acquisition library in the open source community, such as Dropwizard metrics, micrometer, and dubbo metrics, that can be considered for adaptation. Considering that IoTDB is a real-time online system, according to the experience of previously collected metrics leading to a significant drop in performance, these metrics library may not be able to meet the needs of IoTDB in terms of performance.

    Therefore, we develop a set of metric acquisition interfaces and adapt it to other mature acquisition libraries to achieve the benefits of flexible switching and easy target optimization.


    There are several Pull requests working on the topic. Yue Su proposed a  circular array for some metrics. Julian firstly introduced micrometer for monitoring, Chao Wang then improved the design, introduced dropwizardmetrics and formed the first version of this doc. After reviewing the design, we think some of the design is complex and this doc (1) simplify the desgin and (2) compare the performance of micrometer and dropwizardmetrics.

1.2 Targets

Provides a set of metric collector interfaces.

A set of adaptation implementations based on micrometer.

A set of adaptation implementations based on dropwizard.

Test metric creation and query performance on miceometer and dropwizard.

2. Overall Design

2.1 Acquisition system

The acquisition system consists of following four parts.

2.1.1 Metrics

Provide tools for collecting metric in different scenarios, including Counter, Gauge, Meter, Histogram, Timer, each with tags.

2.1.2 MetricManager

a. Provides functions such as creating, finding, updating, and deleting metrics.

b. Provides management of reporter, including starting and stopping reporter.

c. Provides the ability to introduce default metrics(Known Metric).

d.Provides its own start and stop methods.

2.1.3 CompositeReporter

Push the collector's data to other systems, such as Prometheus, JMX, IoTDB, etc.

2.1.4 MetricService

Provides metricManager start-up, acquisition, and shutdown functions that can be used in the future after being registered as Iservice.

2.2 Class diagram

2.3 IMetric

IMetric is the collector parent interface.

public interface IMetric {}

2.3.1 Counter

Counter is a cumulative counter.

public interface Counter extends IMetric {
  /**
   * Counter add 1
   */
  void inc();

  /**
   * Counter add n
   * @param n
   */
  void inc(long n);

  /**
   * Get value of counter
   * @return
   */
  long count();
}

2.3.2 Gauge

Gauge is a staging device for a value.

public interface Gauge extends IMetric {
  /**
   * Set value to gauge
   * @return
   */
  void set(long value);

  /**
   * Get value stored in gauge
   * @return
   */
  long value();
}

2.3.3 Rate

Calculate the rate and average rate of a value over the last 1,5,15 minutes.

public interface Rate extends IMetric {
  /**
   * Get value of Rate
   * @return
   */
  long getCount();

  /**
   * Get one minute rate
   * @return
   */
  double getOneMinuteRate();

  /**
   * Get mean rate
   * @return
   */
  double getMeanRate();

  /**
   * Get five minute rate
   * @return
   */
  double getFiveMinuteRate();

  /**
   * Get fifteen minute rate
   * @return
   */
  double getFifteenMinuteRate();

  /**
   * mark in rate
   */
  void mark();

  /**
   * mark n in rate
   * @param n
   */
  void mark(long n);
}

2.3.4 Histogram and HistogramSnapshot

Snapshot is a class that hosts data, providing a percentile ratio and a list of numbers that are counted by interval cut-off.

public interface Histogram extends IMetric {
  /**
   * update histogram by value
   * @param value
   */
  void update(int value);

  /**
   * update histogram by value
   * @param value
   */
  void update(long value);

  /**
   * get value of histogram
   * @return
   */
  long count();

  /**
   * tak snapshot of histogram
   * @return
   */
  HistogramSnapshot takeSnapshot();
}




public interface HistogramSnapshot {

  /**
   * Get value by quantile
   * @param quantile
   * @return
   */
  public abstract double getValue(double quantile);

  /**
   * Get values in snapshot
   * @return
   */
  public abstract long[] getValues();

  /**
   * Get size of value in snapshot
   * @return
   */
  public abstract int size();

  /**
   * Get median of values
   * @return
   */
  public double getMedian();

  /**
   * Get min of values
   * @return
   */
  public abstract long getMin();

  /**
   * Get mean of values
   * @return
   */
  public abstract double getMean();

  /**
   * Get max of values
   * @return
   */
  public abstract long getMax();

  /**
   * Writes the values of the snapshot to the given stream.
   *
   * @param output an output stream
   */
  public abstract void dump(OutputStream output);
}

2.3.5 Timer

Timer records the histogram of time and the rate of research (Meter and Histogram).

public interface Timer extends IMetric {

  /**
   * update time of timer
   * @param duration
   * @param unit
   */
  void update(long duration, TimeUnit unit);

  /**
   * update timer by millisecond
   * @param durationMillis
   */
  default void updateMillis(long durationMillis) {
    update(durationMillis, TimeUnit.MILLISECONDS);
  }

  /**
   * update timer by microseconds
   * @param durationMicros
   */
  default void updateMicros(long durationMicros) {
    update(durationMicros, TimeUnit.MICROSECONDS);
  }

  /**
   * update timer by nanoseconds
   * @param durationNanos
   */
  default void updateNanos(long durationNanos) {
    update(durationNanos, TimeUnit.NANOSECONDS);
  }

  /**
   * take snapshot of timer
   * @return
   */
  HistogramSnapshot takeSnapshot();

  /**
   * It's not safe to use the update interface.
   *
   * @return the getOrCreatRate related with the getOrCreateTimer
   */
  Rate getImmutableRate();
}


2.4 MetricManager

  MetricManager provides interfaces for new, deleted, modified, and querying function for CompositeReporter andMetric, as well as switches for data acquisition that are exposed.

public interface MetricManager {
  /**
   * Get Counter
   * If exists, then return
   * or create one to return
   * @param metric
   * @param tags string appear in pairs, like sg="ln" will be "sg", "ln"
   * @return
   */
  Counter getOrCreateCounter(String metric, String... tags);

  /**
   * Get Guage
   * If exists, then return
   * or create one to return
   * @param metric
   * @param tags string appear in pairs, like sg="ln" will be "sg", "ln"
   * @return
   */
  Gauge getOrCreatGauge(String metric, String... tags);

  /**
   * Get Rate
   * If exists, then return
   * or create one to return
   * @param metric
   * @param tags string appear in pairs, like sg="ln" will be "sg", "ln"
   * @return
   */
  Rate getOrCreatRate(String metric, String... tags);

  /**
   * Get Histogram
   * If exists, then return
   * or create one to return
   * @param metric
   * @param tags string appear in pairs, like sg="ln" will be "sg", "ln"
   * @return
   */
  Histogram getOrCreateHistogram(String metric, String... tags);

  /**
   * Get Timer
   * If exists, then return
   * or create one to return
   * @param metric
   * @param tags string appear in pairs, like sg="ln" will be "sg", "ln"
   * @return
   */
  Timer getOrCreateTimer(String metric, String... tags);

  /**
   * Update Counter
   * @param delta
   * @param metric
   * @param tags
   */
  void count(int delta, String metric, String... tags);

  /**
   * Update Counter
   * @param delta
   * @param metric
   * @param tags
   */
  void count(long delta, String metric, String... tags);

  /**
   * update Gauge
   * @param value
   * @param metric
   * @param tags
   */
  void gauge(int value, String metric, String... tags);

  /**
   * update Gauge
   * @param value
   * @param metric
   * @param tags
   */
  void gauge(long value, String metric, String... tags);

  /**
   * update Rate
   * @param value
   * @param metric
   * @param tags
   */
  void rate(int value, String metric, String... tags);

  /**
   * update Rate
   * @param value
   * @param metric
   * @param tags
   */
  void rate(long value, String metric, String... tags);

  /**
   * update Histogram
   * @param value
   * @param metric
   * @param tags
   */
  void histogram(int value, String metric, String... tags);

  /**
   * update Histogram
   * @param value
   * @param metric
   * @param tags
   */
  void histogram(long value, String metric, String... tags);

  /**
   * update Timer
   * @param delta
   * @param timeUnit
   * @param metric
   * @param tags
   */
  void timer(long delta, TimeUnit timeUnit, String metric, String... tags);

  /**
   * remove counter
   * @param metric
   * @param tags
   */
  void removeCounter(String metric, String... tags);

  /**
   * remove gauge
   * @param metric
   * @param tags
   */
  void removeGauge(String metric, String... tags);

  /**
   * remove rate
   * @param metric
   * @param tags
   */
  void removeRate(String metric, String... tags);

  /**
   * remove histogram
   * @param metric
   * @param tags
   */
  void removeHistogram(String metric, String... tags);

  /**
   * update timer
   * @param metric
   * @param tags
   */
  void removeTimer(String metric, String... tags);

  /**
   * get all metric keys.
   *
   * @return all MetricKeys, key is metric name, value is tags, which is a string array.
   */
  List<String[]> getAllMetricKeys();

  /**
   * Get all counters
   * @return [name, tags...] -> counter
   */
  Map<String[], Counter> getAllCounters();

  /**
   * Get all gauges
   * @return [name, tags...] -> gauge
   */
  Map<String[], Gauge> getAllGauges();

  /**
   * Get all rates
   * @return [name, tags...] -> rate
   */
  Map<String[], Rate> getAllRates();

  /**
   * Get all histogram
   * @return [name, tags...] -> histogram
   */
  Map<String[], Histogram> getAllHistograms();

  /**
   * Get all timers
   * @return [name, tags...] -> timer
   */
  Map<String[], Timer> getAllTimers();

  /**
   * whether is enable monitor
   * @return
   */
  boolean isEnable();

  /**
   * enable pre-defined metric set.
   *
   * @param metric which metric set we want to collect
   */
  void enablePredefinedMetric(PredefinedMetric metric);

  /**
   * init something.
   *
   * @return whether success
   */
  boolean init();

  /**
   * stop everything and clear
   *
   * @return
   */
  boolean stop();

  /**
   * Get name of manager
   * @return
   */
  String getName();
}

2.5 CompositeReporter

CompositeReporter is a data push interface.

public interface CompositeReporter {

  /**
   * Start all reporter
   * @return
   */
  boolean start();

  /**
   * Start reporter by name
   * name values in jmx, prometheus, iotdb, internal
   * @param reporter
   * @return
   */
  boolean start(String reporter);

  /**
   * Stop all reporter
   * @return
   */
  boolean stop();

  /**
   * Stop reporter by name
   * name values in jmx, prometheus, iotdb, internal
   * @param reporter
   * @return
   */
  boolean stop(String reporter);

  /**
   * set manager to reporter
   * @param metricManager
   */
  void setMetricManager(MetricManager metricManager);

  /**
   * Get name of CompositeReporter
   * @return
   */
  String getName();

3. Test Report

We implemented the monitoring framework using Dropwizard and Micromometer respectively, and tested the results as follows:

3.1 Test Environment

Processor:Inter(R) Core(TM) i7-1065G7 CPU

RAM: 32G

3.2 Test Metrics

  We use a single thread to create counter and run the test cases separately in two frameworks of Microsoometer and Dropwizard. The test metrics as follows:

  1. memory : Memory usage in MB.
  2. create : The time required to create, in ms.
  3. searchInorder : The time required for the sequential query, in ms.
  4. searchDisorder : The time required for random queries in ms.

3.3 Test parameters

  1. metric : test metric 
  2. name : The name of the test metric, unify to one length.
  3. tag : The tag of the test metric, unify to one length.
  4. metricNumberTotal:The number of metrics tested.
  5. tagSingleNumber:Number of tags of the test metric.
  6. tagTotalNumber:The number of tag pools, the default is 1000, all tags are taken out of the tag pool.
  7. searchNumber:The number of queries, the default is 1000000.
  8. loop:The number of query loops, the default is 10.

3.4 Test Result

3.5 Test Script

3.5.1 Test

Test holds a MetricManager and is responsible for completing specific testing.

public class Test {
    private Integer metricNumberTotal;
    private Integer metricNameNumberLength;
    private Integer tagTotalNumber;
    private Integer tagSingleNumber;
    private Integer searchNumber;
    private String[] TAGS;
    private static Random random = new Random(43);
    private static MetricManager metricManager = MetricService.getMetricManager();
    private static Map<String, String[]> name2Tags = new HashMap<>();

    /**
     *
     * @param metricNumber
     * @param tagTotalNumber
     * @param tagSingleNumber
     * @param searchNumber
     */
    Test(Integer metricNumber, Integer tagTotalNumber, Integer tagSingleNumber
            , Integer searchNumber){
        this.metricNumberTotal = metricNumber;
        this.metricNameNumberLength = String.valueOf(metricNumberTotal).length();
        this.tagTotalNumber = tagTotalNumber;
        this.tagSingleNumber = tagSingleNumber;
        this.searchNumber = searchNumber;
        TAGS = new String[tagTotalNumber];
        for(int i = 0; i < tagTotalNumber; i++){
            TAGS[i] = initTag(i);
        }
    }

    /**
     * generate tags for metric
     * @param number
     * @return
     */
    private String initTag(Integer number){
        StringBuilder stringBuilder = new StringBuilder(String.valueOf(number));
        while(stringBuilder.length() < 3){
            stringBuilder.insert(0, '0');
        }
        stringBuilder.insert(0, "Tag");
        return stringBuilder.toString();
    }

    /**
     * generate name for metric
     * @param number
     * @return
     */
    private String generateName(Integer number){
        StringBuilder stringBuilder = new StringBuilder(String.valueOf(number));
        Integer length = String.valueOf(metricNumberTotal).length();
        while(stringBuilder.length() < metricNameNumberLength){
            stringBuilder.insert(0, '0');
        }
        stringBuilder.insert(0, "counter");
        return stringBuilder.toString();
    }

    /**
     * generate tags of a metric
     * @return
     */
    private String[] generateTags(){
        List<Integer> targets = new ArrayList<>();
        while(targets.size() < tagSingleNumber){
            Integer target = generateRandom(tagTotalNumber);
            if(!targets.contains(target)){
                targets.add(target);
            }
        }
        String[] tags = new String[tagSingleNumber];
        for(int i = 0; i < tagSingleNumber; i++){
            tags[i] = TAGS[targets.get(i)];
        }
        return tags;
    }

    /**
     * generate next int
     * @param max
     * @return
     */
    private Integer generateRandom(Integer max){
        return random.nextInt(max);
    }

    /**
     * create metric in order
     * @return
     */
    public long createMetricInorder(){
        long total = 0;
        for(int i = 0; i < metricNumberTotal; i++){
            String name = generateName(i);
            String[] tags = generateTags();
            long start = System.currentTimeMillis();
            metricManager.getOrCreateCounter(name, tags);
            long stop = System.currentTimeMillis();
            total += (stop - start);
            name2Tags.put(name, tags);
        }
        return total;
    }

    /**
     * search metric in order
     * @return
     */
    public long searchMetricInorder(){
        long total = 0;
        for(int i = 0; i < searchNumber; i++){
            total += searchOne(i);
        }
        return total;
    }

    /**
     * search metric in random way
     * @return
     */
    public long searchMetricDisorder(){
        long total = 0;
        for(int i = 0; i < searchNumber; i++){
            total += searchOne(generateRandom(metricNumberTotal - 1));
        }
        return total;
    }

    private long searchOne(Integer target) {
        String name = generateName(target % metricNumberTotal);
        String[] tags = name2Tags.get(name);
        long start = System.currentTimeMillis();
        metricManager.getOrCreateCounter(name, tags);
        long stop = System.currentTimeMillis();
        return stop - start;
    }

    @Override
    public String toString() {
        return metricNumberTotal +
                "," + tagTotalNumber +
                "," + tagSingleNumber +
                "," + searchNumber;
    }

    public void stop(){
        name2Tags.clear();
        metricManager.stop();
    }
}

3.5.2 TestPlan

TestPlan sets up specific test plans to complete testing and statistics.

public class TestPlan {
    private static final Integer[] TAG_NUMBERS = {2, 4, 6, 8, 10};
    private static final Integer[] METRIC_NUMBERS = {1000, 10000, 50000, 100000, 500000, 1000000};
    private static final Integer LOOP = 10;
    private static final Integer tagTotalNumber = 1000;
    private static final Integer searchNumber = 100000;

    private static void test(Integer metric, Integer tag){
        Long[] times = {0L, 0L, 0L};
        Test test = new Test(metric, tagTotalNumber, tag, searchNumber);
        times[0] += test.createMetricInorder();
        for(int i = 0; i < LOOP; i ++){
            times[1] += test.searchMetricInorder();
            times[2] += test.searchMetricDisorder();
        }
        test.stop();
        System.out.println(metric + "," + tagTotalNumber + "," + tag + "," +
                searchNumber + "," + (times[0]) + "," +
                (times[1] * 1.0 / LOOP) + "," + (times[2] * 1.0 / LOOP));
    }

    public static void main(String[] args) {
        System.setProperty("METRIC_CONF", "path of yml");
        for(Integer metric: METRIC_NUMBERS){
            for(Integer tag: TAG_NUMBERS){
                test(metric, tag);
            }
        }
    }
}

4. DropWizard Unit Test Results

To ensure the reliability of the features, we unit tested DrowizardMetricManager, covering the main function. To re-emerge the test, you need to modify the yml profile address in the init() method (the profile is stored under the conf of the statistical directory). The final result of the test is shown in the figure below.

5. Dropwizard connects to Prometheus via PushGateway

5.1 Experimental process

This test was done using the PrometheusRunTest script, which is followed.

public class PrometheusRunTest {
  public MetricManager metricManager = MetricService.getMetricManager();

  public static void main(String[] args) throws InterruptedException {
    System.setProperty("line.separator", "\n");
    System.setProperty("METRIC_CONF", "path of yml");
    PrometheusRunTest prometheusRunTest = new PrometheusRunTest();
    Counter counter = prometheusRunTest.metricManager.getOrCreateCounter("counter");
    while (true) {
      counter.inc();
      TimeUnit.SECONDS.sleep(1);
    }
  }
}

The configuration of the parameters for Prometheus is completed in the configuration file (yml file) used by the script, as follows:

prometheusReporterConfig:
    prometheusExporterUrl: http://localhost 
    prometheusExporterPort: 9091 

Through this script, dropwizard monitors a counter that increases by 1 every 1 second, while updates to all metrics are pushed to the specified pushgateway waiting for Prometheus to use.

5.2 Experimental environment

Grafana runs port 8081

Prometheus runs port 9090

PushGatewayruns port 9091

5.3 Experimental result

To make it more beautiful, this experiment uses Grafana to read and present data from Prometheus. You can see that the steady growth of the counter is successfully captured and visualized.


  • No labels