Page History

...

Get the current state of the system

#	What information to getgather?	How to get that information?	How to identify if there is a red flag?
1	Is AMS able to handle raw writes*?	Look for log lines like 'AsyncProcess:1597 - #1, waiting for 13948 actions to finish' in the log.	If the number of actions to finish keep increasing and eventually AMS shuts down, then it could mean AMS is not able to handle raw writes.
2	How long does it take for 2 min cluster aggregator to finish?	grep "TimelineClusterAggregatorSecond" /var/log/ambari-metrics-collector/ambari-metrics-collector.log \| less. Look for the time taken between 'Start aggregation cycle....' and 'Saving ## metric aggregates'	>2 mins aggregation time
3	How long does it take for 5 min host aggregator to finish?	grep "TimelineHostAggregatorMinute" /var/log/ambari-metrics-collector/ambari-metrics-collector.log \| less. Look for the time taken between 'Start aggregation cycle....' and 'Saving ## metric aggregates'	>5 mins aggregation time
4	How many metrics are being collected?	curl -K http://<ams-host>:6188/ws/v1/timeline/metrics/metadata -o /tmp/metrics_metadata.txt Number of metrics is the output of the command 'grep -o "metricname" /tmp/metrics_metadata.txt \| wc -l' Also find out which component is sending a lot of metrics.	> 15000 metrics
5	What is the number of regions and store files in AMS HBase?	This can be got from AMS HBase Master UI. http://<METRICS_COLLECTOR_HOST>:61310	> 150 regions > 2000 store files
6	How fast is AMS HBase flushing, and how much data is being flushed?	Check for master log in embedded mode and RS log in distributed mode. grep "memstore flush" /var/log/metric_collector/hbase-ams-<>.log \| less Check how often METRIC_RECORD flushes happen and how much data is flushed?	>10 flushes in a minute could be a problem. The flush size should be approx equal to flush size config in ams-hbase-site
7	If AMS is in distributed mode, is there a local Datanode?	From the cluster.	In distributed mode, a local datanode helps with HBase read shortcircuit feature. (http://hbase.apache.org/0.94/book/perf.hdfs.html)

...

Space shortcuts

Child pages

Versions Compared

Old Version 29

New Version 30

Key

Get the current state of the system