Configurations - Tuning

Cluster Size	Recommended Mode	Collector Heapsize ams-env : metrics_collector_heapsize	HBase Master Heapsize ams-hbase-env : hbase_master_heapsize	HBase RS Heapsize ams-hbase-env : hbase_regionserver_heapsize	HBase Master xmn size ams-hbase-env : hbase_master_xmn_size	HBase RS xmn size ams-hbase-env : regionserver_xmn_size
1 - 10	Embedded	512	1408	512	192	-
11 - 20	Embedded	1024	1920	512	256	-
21 - 100	Embedded	1664	5120	512	768	-
100 - 300	Embedded	4352	13056	512	2048	-
300 - 500	Distributed	4352	512	13056	102	2048
500 - 800	Distributed	7040	512	21120	102	3072
800 - 1000	Distributed	11008	512	32768	102	5120
1000+	Distributed	13696	512	32768	102	5120

One or more of the following consequences can be seen on the cluster.

Metrics Collector shuts down intermittently. Since Auto Restart is enabled for Metrics collector by default, this will up show as an alert stating 'Metrics collector has been auto restarted # times the last 1 hour'.
Partial metrics data is seen.
- All non-aggregated host metrics are seen (HDFS Namenode metrics / Host summary page on Ambari / System - Servers Grafana dashboard).
- Aggregated data is not seen. (AMS Summary page / System - Home Grafana dashboard / HBase - Home Grafana dashboard).

What to get?	How to get?	How to identify red flag?
How long does it take for 2 min cluster aggregator to finish?	grep "TimelineClusterAggregatorSecond" /var/log/ambari-metrics-collector/ambari-metrics-collector.log \| less. Look for the time taken between 'Start aggregation cycle....' and 'Saving ## metric aggregates'	>2 mins aggregation time
How long does it take for 5 min cluster aggregator to finish?	grep "TimelineHostAggregatorMinute" /var/log/ambari-metrics-collector/ambari-metrics-collector.log \| less. Look for the time taken between 'Start aggregation cycle....' and 'Saving ## metric aggregates'	>5 mins aggregation time
How many metrics are being collected?	curl -K http://<ams-host>:6188/ws/v1/timeline/metrics/metadata -o /tmp/metrics_metadata.txt Number of metrics is the output of the command 'grep -o "metricname" /tmp/metrics_metadata.txt \| wc -l'	> 15000 metrics
What is the number of regions and store files in AMS HBase?	This can be got from AMS HBase Master UI. http://<METRICS_COLLECTOR_HOST>:61310	> 150 regions > 2000 store files
How fast is AMS HBase flushing, and how much data is being flushed?	Check for master log in embedded mode and RS log in distributed mode. grep "memstore flush" /var/log/metric_collector/hbase-ams-<>.log \| less Check how often METRIC_RECORD flushes happen and how much data is flushed?	>2-3 flushes every second could be a problem. The flush size should be approx equal to flush size config in ams-hbase-site

Configuration	Property	Description	Minimum Recommended values (Host Count => MB)
ams-site	phoenix.query.maxGlobalMemoryPercentage	Percentage of total heap memory used by Phoenix threads in the Metrics Collector API/Aggregator daemon.	20 - 30, based on available memory. Default = 25.
ams-site	phoenix.spool.directory	Set directory for Phoenix spill files. (Client side)	Set this to different disk from hbase.rootdir dir if possible.
ams-hbase-site	phoenix.spool.directory	Set directory for Phoenix spill files. (Server side)	Set this to different disk from hbase.rootdir dir if possible.
ams-hbase-site	phoenix.query.spoolThresholdBytes	Threshold size in bytes after which results from parallelly executed query results are spooled to disk.	Set this to higher value based on available memory. Default is 12 mb.

Space shortcuts