1.背景

1.1 问题

随着IoTDB越来越复杂,需要对一些系统运行指标进行监控来提高系统可运维性、健壮性。

而目前IoTDB没有比较完备的指标采集器来支持对一些系统运行指标采集,因此需要设计一套指标采集系统。

目前开源界也有一些使用比较广泛的指标采集库,例如Dropwizard metrics, micrometer, dubbo metrics,可以考虑进行适配使用。

而考虑到IoTDB是一个实时在线系统,根据之前采集指标导致性能大幅下降的经验来看,这些指标库不一定在性能上能满足IoTDB的需求。

因此,开发一套指标采集接口,提供对其他成熟的采集库的适配和自己实现,达到灵活切换和容易针对性优化的好处。

1.2 目标

  1. 提供一套指标采集器接口
  2. 提供基于micrometer的一套适配实现

2.总体设计

如下图所示,采集系统分为四大块:

  1. 采集器metrics:提供对不同场景指标的采集工具,包括Counter,Gauge,Meter,Histogram,Timer,每个指标都可以带有tag。
  2. 采集器注册表metric Registry:提供对采集器的创建,获取,分类;可以有多个,每个管理一类指标。
  3. 采集指标推送metric Reporter: 提供对采集器的数据推送到其他的系统中,例如Prometheus中
  4. 采集器注册表管理者metric Manager: 提供对采集器注册表的管理,例如按类型之类的进行分类;提供获取当前采集数据的接口。

下面是各个类之间的关系


2.1 采集器

2.1.1 Metric采集器父类接口,提供一个获取当前采集器的唯一ID接口

public interface Metric {
  MetricId getId();
}
 
public class MetricId {
  private final String name;
  private final Tags tags;
  private final Constants.Type type;

  private final MetricId syntheticAssociation;  // 指标包含在别的指标内,记的是父指标的id
  private final String description;
}

2.1.2 Counter累积计数器

public interface Counter extends Metric {
  void inc();
  void inc(long n);
  void dec();
  void dec(long n);

  long count();
}


2.1.3 Gauge 某个值的暂存器

public interface Gauge<T> extends Metric {
  T value();
}

2.1.4 Meter 计算某个值过去1,5,15分钟的速率

public interface Meter extends Metric {
  long getCount();
  double getOneMinuteRate();
  double getMeanRate();
  double getFiveMinuteRate();
  double getFifteenMinuteRate();

  void mark();
  void mark(long n);
}


2.1.5 Histogram直方图

snapshot是承载数据的类,提供百分位比,以及按区间切断而进行统计的个数列表

public interface Histogram extends Metric {
  void update(int value);
  void update(long value);
  long count();

  Snapshot takeSnapshot();
}
 
public interface Snapshot {
  long count();

  double total();

  double max();

  double mean();

  ValueAtPercentile[] percentileValues();

  CountAtBucket[] histogramCounts();

  void dump(OutputStream output);
}
 
public final class ValueAtPercentile {

  private final double percentile = 0;
  private final double value = 0;
}
 
public final class CountAtBucket {

  private final double bucket = 0;
  private final double count = 0;
}


2.1.6 Timer记录时间的直方图和调研的速率~(Meter + Histogram)

public interface Timer extends Metric {
  void update(long duration, TimeUnit unit);

  default void updateMillis(long durationMillis) {
    update(durationMillis, TimeUnit.NANOSECONDS);
  }

  default void updateMicros(long durationMicros) {
    update(durationMicros, TimeUnit.MICROSECONDS);
  }

  default void updateNanos(long durationNanos) {
    update(durationNanos, TimeUnit.NANOSECONDS);
  }

  Snapshot takeSnapshot();

  Meter getMeter();
}


2.1.7 MetricSet 多个有关联的Metric组合,例如GC的相关指标,有多个测量值

public interface MetricSet extends Metric {
  Map<MetricId, Metric> getMetrics();
}


2.2 采集器注册表

2.2.1 MetricRegistry 管理Metric的创建和获取,以及Metric的获取和创建事件通知

public interface MetricRegistry {
  Counter newOrGetCounter(MetricId id);
  <T> Gauge newOrGetGauge(MetricId id);
  Meter newOrGetMeter(MetricId id);
  Histogram newOrGetHistogram(MetricId id);
  Timer newOrGetTimer(MetricId id);

  Metric register(MetricId id, Metric metric);
  void registerAll(MetricSet metricSet);

  void addListener(MetricRegistryListener listener);
  void removeListener(MetricRegistryListener listener);

  Metric remove(MetricId id);

  Map<MetricId, Metric> getAllMetrics();
  Map<MetricId, Metric> getAllMetrics(MetricFilter metricFilter);

  MetricRegistryInfo getMetricRegistryInfo();
}
 
 
public interface MetricRegistryListener extends EventListener {
  void onMetricAdded(Metric metric);
  void onMetricRemoved(Metric metric);
}
 
public class MetricRegistryInfo {
  protected final String metricsName;
  protected final String metricsDescription;
  protected final String metricsContext;
  protected final String metricsJmxContext;
  protected final boolean existingSource;
}


2.3 采集数据汇报者

2.3.1 MetricReporter数据推送或者http提供方,从MetricManager里获取数据时可以使用过滤器MetricFilter,只获取感兴趣的数据

public interface MetricReporter extends Cloneable {
  void start();
  void report();
  void stop();
}

public interface MetricFilter {

  default Constants.MetricFilterReply accept(MetricId id) {
    return Constants.MetricFilterReply.DENY;
  }
}


2.4  注册表管理者

2.4.1 IMetricManager采集的入口,提供创建和获取MetricRegistry以及Metric的接口,提供采集数据是否暴露的开关

public interface IMetricManager {
  Counter getCounter(String group, MetricId id);
  <T> Gauge getGauge(String group, MetricId id);
  Meter getMeter(String group, MetricId id);
  Histogram getHistogram(String group, MetricId id);
  Timer getTimer(String group, MetricId id);

  List<String> listMetricGroups();
  Map<String, Set<MetricId>> listMetricNamesByGroup();

  Map<MetricId, Metric> getMetrics(String group);
  Map<MetricId, Metric> getMetrics(String group, MetricFilter metricFilter);

  MetricRegistry getMetricRegistryByGroup(String group);
  void removeMetricRegistry(String group);

  boolean isEnabled();
  void setEnabled(boolean enabled);
  void clear();
}


2.5 其他工具类

2.5.1 MetricRegistries

管理反射产生的MetricRegistry类型

public abstract class MetricRegistries {

  private static final class LazyHolder {
    private static final MetricRegistries GLOBAL = MetricRegistriesLoader.load();
  }

  public static MetricRegistries global() {
    return LazyHolder.GLOBAL;
  }

  public abstract void clear();

  public abstract MetricRegistry create(MetricRegistryInfo info);
  public abstract boolean remove(MetricRegistryInfo key);
  public abstract Optional<MetricRegistry> get(MetricRegistryInfo info);
  public abstract Set<MetricRegistryInfo> getMetricRegistryInfos();
  public abstract Collection<MetricRegistry> getMetricRegistries();
}


2.5.2 MetricRegistriesLoader

利用SPI机制加载反射生成MetricRegistry

public class MetricRegistriesLoader {
  private MetricRegistriesLoader() {
  }

  private static final String defaultClass
    = "org.apache.iotdb.metrics.impl.MetricRegistriesImpl";

  public static MetricRegistries load() {
    List<MetricRegistries> availableImplementations = getDefinedImplemantations();
    return load(availableImplementations);
  }

  static MetricRegistries load(List<MetricRegistries> availableImplementations) {

    if (availableImplementations.size() == 1) {
      // One and only one instance -- what we want/expect
      MetricRegistries impl = availableImplementations.get(0);
      return impl;
    } else if (availableImplementations.isEmpty()) {
      try {
        return ReflectionUtils.newInstance((Class<MetricRegistries>)Class.forName(defaultClass));
      } catch (ClassNotFoundException e) {
        throw new RuntimeException(e);
      }
    } else {
      // Tell the user they're doing something wrong, and choose the first impl.
      StringBuilder sb = new StringBuilder();
      for (MetricRegistries factory : availableImplementations) {
        if (sb.length() > 0) {
          sb.append(", ");
        }
        sb.append(factory.getClass());
      }
      return availableImplementations.get(0);
    }
  }

  private static List<MetricRegistries> getDefinedImplemantations() {
    ServiceLoader<MetricRegistries> loader = ServiceLoader.load(MetricRegistries.class);
    List<MetricRegistries> availableFactories = new ArrayList<>();
    for (MetricRegistries impl : loader) {
      availableFactories.add(impl);
    }
    return availableFactories;
  }
}
  • No labels