The following page documents common problems discovered with Ambari Metrics Service and provides a guide for things to look out for and already solved problems.
Important facts to collect from the system:
Problems with Metric Collector host
- Total available System memory, output of : "free -g"
- Total available disk space and available partitions, output of : "df -h "
- Total number of hosts in the cluster
- Services deployed in the cluster (This is purely to estimate the amount of metric data generated)
- Collector configs: /etc/ams-hbase/conf/hbase-env.sh, /etc/ams-hbase/conf/hbase-site.xml, /etc/ambari-metrics-collector/conf/ams-env.sh, /etc/ambari-metrics-collector/conf/ams-site.xml
- Collector logs: /var/log/ambari-metrics-collector/ambari-metrics-collector.log, /var/log/ambari-metrics-collector/hbase-ams-master-<host>.log, /var/log/ambari-metrics-collector/hbase-ams-master-<host>.out
Note: Additionally, If distributed mode is enabled, /var/log/ambari-metrics-collector/hbase-ams-zookeeper-<host>.log, /var/log/ambari-metrics-collector/hbase-ams-regionserver-<host>.log
Problems with Metric Monitor host
- Monitor log file: /etc/ambari-metrics-monitor/ambari-metrics-monitor.out
Issue 1: AMS HBase process slow disk writes
The symptoms and resolutions below address the embedded mode of AMS only.
Symptoms:
Behavior | How to detect |
---|---|
High CPU usage | HBase process on Collector host taking up more than 100% of 1 core |
HBase Log: Compaction times | grep hbase-ams-master-<host>.log | grep "Finished memstore flush" This yields MB written in X milliseconds, generally 128 MBps and above is average speed unless the disk is contended. Also this search reveals how many times compaction ran per minute. A value greater than 6 or 8 is a warning that write speeds are far greater than what HBase can hold in memory |
Collector Log : "waiting for some tasks to finish" | ambari-metric-collector log shows messages where AsyncProcess writes are queued up |
Resolutions:
Configuration Change | Description |
---|---|
ams-hbase-site :: hbase.rootdir | Change this path to a disk mount that is not heavily contended. |
ams-hbase-ste :: hbase.tmp.dir | Change this path to a location different from hbase.rootdir |
ams-hbase-env :: hbase_master_heapsize ams-hbase-site :: hbase.hregion.memstore.flush.size | Bump this value up so more data is held in memory to address I/O speeds. If heap size is increased and resident memory usage does not go up, this parameter can be changed to address how much data can be stored in a memstore per Region. Default is set to 128 MB. Be careful with modifying this value, generally limit the setting between 64 MB to 512 MB, since more data help in memory means longer time to write it to disk during a Flush operation. |
Issue 2: