Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

#What information to gather?How to get that information?How to identify if there is a red flag?
1Is AMS able to handle raw writes*?

Look for log lines like 'AsyncProcess:1597 - #1, waiting for 13948 actions to finish' in the log.

 

If the number of actions to finish keep increasing and eventually AMS shuts down,

then it could mean AMS is not able to handle raw writes.

2How long does it take for 2 min cluster aggregator to finish?

grep "TimelineClusterAggregatorSecond" /var/log/ambari-metrics-collector/ambari-metrics-collector.log | less.

Look for the time taken between 'Start aggregation cycle....' and 'Saving ## metric aggregates'

>2 mins aggregation time
3How long does it take for 5 min host aggregator to finish?

grep "TimelineHostAggregatorMinuteTimelineMetricHostAggregatorMinute" /var/log/ambari-metrics-collector/ambari-metrics-collector.log | less.

Look for the time taken between 'Start aggregation cycle....' and 'Saving ## metric aggregates'

>5 mins aggregation time
4How many metrics are being collected?

curl -K http://<ams-host>:6188/ws/v1/timeline/metrics/metadata -o /tmp/metrics_metadata.txt

Number of metrics is the output of the command 'grep -o "metricname" /tmp/metrics_metadata.txt | wc -l'

 

> 15000 metrics

Find out which component is sending a lot of metrics.

5What is the number of regions and store files in AMS HBase?

This can be got from AMS HBase Master UI.

http://<METRICS_COLLECTOR_HOST>:61310

> 150 regions

> 2000 store files

6How fast is AMS HBase flushing, and how much data is being flushed?

Check for master log in embedded mode and RS log in distributed mode.

grep "memstore flush" /var/log/metric_collector/hbase-ams-<>.log | less

Check how often METRIC_RECORD flushes happen and how much data is flushed?

 

>10 flushes in a minute could be a problem.

The flush size should be approx equal to flush size config in ams-hbase-site

7If AMS is in distributed mode, is there a local Datanode?From the cluster.

In distributed mode, a local datanode helps with HBase read shortcircuit feature.

(http://hbase.apache.org/0.94/book/perf.hdfs.html)

...