Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Metrics Collector shuts down intermittently. Since Auto Restart is enabled for Metrics collector by default, this will up show as an alert stating 'Metrics collector has been auto restarted # times the last 1 hour'.
  • Partial data is seen.
    • All non-aggregated host metrics are seen (HDFS Namenode metrics  / Host summary page on Ambari / System - Servers Grafana dashboard).
    • Aggregated data is not seen. (AMS Summary page / System - Home Grafana dashboard / HBase - Home Grafana dashboard).
    • Aggregations are taking too long (if completing). 
      • Time

Systematically Troubleshooting the scale issue

  • Get the current state of the system
What to get?How to get?
Question to askHow do we find the answer?Fix / Workaround for this issue
How many metrics are being collected? 
  • curl -K http://<ams-host>:6188/ws/v1/timeline/metrics/metadata -o /tmp/metrics_metadata.txt
  • Number of metrics is the output of the command 'grep -o "metricname" /tmp/metrics_metadata.txt | wc -l'
 
What is the number of regions and store files in AMS HBase?  
Is the memory recommendation valid?  

This can be got from AMS HBase Master UI.

http://<METRICS_COLLECTOR_HOST>:61310

How long does it take to aggregate  
  
  




Advanced Configurations

...