1.背景
1.1 问题
随着IoTDB越来越复杂,需要对一些系统运行指标进行监控来提高系统可运维性、健壮性。
而目前IoTDB没有比较完备的指标采集器来支持对一些系统运行指标采集,因此需要设计一套指标采集系统。
目前开源界也有一些使用比较广泛的指标采集库,例如Dropwizard metrics, micrometer, dubbo metrics,可以考虑进行适配使用。
而考虑到IoTDB是一个实时在线系统,根据之前采集指标导致性能大幅下降的经验来看,这些指标库不一定在性能上能满足IoTDB的需求。
因此,开发一套指标采集接口,提供对其他成熟的采集库的适配和自己实现,达到灵活切换和容易针对性优化的好处。
1.2 目标
- 提供一套指标采集器接口
- 提供基于micrometer的一套适配实现
2.总体设计
如下图所示,采集系统分为四大块:
- 采集器metrics:提供对不同场景指标的采集工具,包括Counter,Gauge,Meter,Histogram,Timer,每个指标都可以带有tag。
- 采集器注册表metric Registry:提供对采集器的创建,获取,分类;可以有多个,每个管理一类指标。
- 采集指标推送metric Reporter: 提供对采集器的数据推送到其他的系统中,例如Prometheus中
- 采集器注册表管理者metric Manager: 提供对采集器注册表的管理,例如按类型之类的进行分类;提供获取当前采集数据的接口。
下面是各个类之间的关系
2.1 采集器
2.1.1 Metric采集器父类接口,提供一个获取当前采集器的唯一ID接口
public interface Metric { MetricId getId(); } public class MetricId { private final String name; private final Tags tags; private final Constants.Type type; private final MetricId syntheticAssociation; // 指标包含在别的指标内,记的是父指标的id private final String description; }
2.1.2 Counter累积计数器
public interface Counter extends Metric { void inc(); void inc(long n); void dec(); void dec(long n); long count(); }
2.1.3 Gauge 某个值的暂存器
public interface Gauge<T> extends Metric { T value(); }
2.1.4 Meter 计算某个值过去1,5,15分钟的速率
public interface Meter extends Metric { long getCount(); double getOneMinuteRate(); double getMeanRate(); double getFiveMinuteRate(); double getFifteenMinuteRate(); void mark(); void mark(long n); }
2.1.5 Histogram直方图
snapshot是承载数据的类,提供百分位比,以及按区间切断而进行统计的个数列表
public interface Histogram extends Metric { void update(int value); void update(long value); long count(); Snapshot takeSnapshot(); } public interface Snapshot { long count(); double total(); double max(); double mean(); ValueAtPercentile[] percentileValues(); CountAtBucket[] histogramCounts(); void dump(OutputStream output); } public final class ValueAtPercentile { private final double percentile = 0; private final double value = 0; } public final class CountAtBucket { private final double bucket = 0; private final double count = 0; }
2.1.6 Timer记录时间的直方图和调研的速率~(Meter + Histogram)
public interface Timer extends Metric { void update(long duration, TimeUnit unit); default void updateMillis(long durationMillis) { update(durationMillis, TimeUnit.NANOSECONDS); } default void updateMicros(long durationMicros) { update(durationMicros, TimeUnit.MICROSECONDS); } default void updateNanos(long durationNanos) { update(durationNanos, TimeUnit.NANOSECONDS); } Snapshot takeSnapshot(); Meter getMeter(); }
2.1.7 MetricSet 多个有关联的Metric组合,例如GC的相关指标,有多个测量值
public interface MetricSet extends Metric { Map<MetricId, Metric> getMetrics(); }
2.2 采集器注册表
2.2.1 MetricRegistry 管理Metric的创建和获取,以及Metric的获取和创建事件通知
public interface MetricRegistry { Counter newOrGetCounter(MetricId id); <T> Gauge newOrGetGauge(MetricId id); Meter newOrGetMeter(MetricId id); Histogram newOrGetHistogram(MetricId id); Timer newOrGetTimer(MetricId id); Metric register(MetricId id, Metric metric); void registerAll(MetricSet metricSet); void addListener(MetricRegistryListener listener); void removeListener(MetricRegistryListener listener); Metric remove(MetricId id); Map<MetricId, Metric> getAllMetrics(); Map<MetricId, Metric> getAllMetrics(MetricFilter metricFilter); MetricRegistryInfo getMetricRegistryInfo(); } public interface MetricRegistryListener extends EventListener { void onMetricAdded(Metric metric); void onMetricRemoved(Metric metric); } public class MetricRegistryInfo { protected final String metricsName; protected final String metricsDescription; protected final String metricsContext; protected final String metricsJmxContext; protected final boolean existingSource; }
2.3 采集数据汇报者
2.3.1 MetricReporter数据推送或者http提供方,从MetricManager里获取数据时可以使用过滤器MetricFilter,只获取感兴趣的数据
public interface MetricReporter extends Cloneable { void start(); void report(); void stop(); } public interface MetricFilter { default Constants.MetricFilterReply accept(MetricId id) { return Constants.MetricFilterReply.DENY; } }
2.4 注册表管理者
2.4.1 IMetricManager采集的入口,提供创建和获取MetricRegistry以及Metric的接口,提供采集数据是否暴露的开关
public interface IMetricManager { Counter getCounter(String group, MetricId id); <T> Gauge getGauge(String group, MetricId id); Meter getMeter(String group, MetricId id); Histogram getHistogram(String group, MetricId id); Timer getTimer(String group, MetricId id); List<String> listMetricGroups(); Map<String, Set<MetricId>> listMetricNamesByGroup(); Map<MetricId, Metric> getMetrics(String group); Map<MetricId, Metric> getMetrics(String group, MetricFilter metricFilter); MetricRegistry getMetricRegistryByGroup(String group); void removeMetricRegistry(String group); boolean isEnabled(); void setEnabled(boolean enabled); void clear(); }
2.5 其他工具类
2.5.1 MetricRegistries
管理反射产生的MetricRegistry类型
public abstract class MetricRegistries { private static final class LazyHolder { private static final MetricRegistries GLOBAL = MetricRegistriesLoader.load(); } public static MetricRegistries global() { return LazyHolder.GLOBAL; } public abstract void clear(); public abstract MetricRegistry create(MetricRegistryInfo info); public abstract boolean remove(MetricRegistryInfo key); public abstract Optional<MetricRegistry> get(MetricRegistryInfo info); public abstract Set<MetricRegistryInfo> getMetricRegistryInfos(); public abstract Collection<MetricRegistry> getMetricRegistries(); }
2.5.2 MetricRegistriesLoader
利用SPI机制加载反射生成MetricRegistry
public class MetricRegistriesLoader { private MetricRegistriesLoader() { } private static final String defaultClass = "org.apache.iotdb.metrics.impl.MetricRegistriesImpl"; public static MetricRegistries load() { List<MetricRegistries> availableImplementations = getDefinedImplemantations(); return load(availableImplementations); } static MetricRegistries load(List<MetricRegistries> availableImplementations) { if (availableImplementations.size() == 1) { // One and only one instance -- what we want/expect MetricRegistries impl = availableImplementations.get(0); return impl; } else if (availableImplementations.isEmpty()) { try { return ReflectionUtils.newInstance((Class<MetricRegistries>)Class.forName(defaultClass)); } catch (ClassNotFoundException e) { throw new RuntimeException(e); } } else { // Tell the user they're doing something wrong, and choose the first impl. StringBuilder sb = new StringBuilder(); for (MetricRegistries factory : availableImplementations) { if (sb.length() > 0) { sb.append(", "); } sb.append(factory.getClass()); } return availableImplementations.get(0); } } private static List<MetricRegistries> getDefinedImplemantations() { ServiceLoader<MetricRegistries> loader = ServiceLoader.load(MetricRegistries.class); List<MetricRegistries> availableFactories = new ArrayList<>(); for (MetricRegistries impl : loader) { availableFactories.add(impl); } return availableFactories; } }