You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

Navigation

Ambari SCOM

Use the Ambari SCOM main navigation tree to browse cluster, HDFS and MapReduce performance metrics.

Cluster Summary

This scenario checks Clusters health state. User can choose the Cluster by clicking Cluster Name, after User can see intuitively visualization:

  • Cluster Services
  • Participating Hosts
  • Live vs. Dead Nodes
  • Space Utilization

After user selects a Cluster Service, Participating Hosts will populate automatically.

Cluster Diagram

See a layout of Services and Components across your cluster hosts.

HDFS Service Summary

This scenario checks HDFS Cluster Services health state. User can choose the Cluster by clicking Parent Cluster Name, after User can see intuitively visualization:

  • Files Summary metrics
  • Block Summary metrics
  • I/O Summary metrics
  • Capacity Remaining

HDFS NameNode

This scenario checks NameNode Host Component health state. User can choose the Cluster by clicking Parent Cluster Name, after User can see intuitively visualization:

  • Memory Heap Utilization
  • Thread Status
  • Garbage Collection Time (ms)
  • Average RPC Wait Time

MapReduce Service Summary

This scenario checks MapReduce Cluster Services health state. User can choose the Cluster by clicking Parent Cluster Name, after User can see intuitively visualization:

  • Jobs Summary
  • TaskTrackers Summary
  • Slots Utilization
  • Maps vs. Reducers

MapReduce JobTracker

This scenario checks JobTracker Host Component health state. User can choose the Cluster by clicking Parent Cluster Name, after User can see intuitively visualization:

  • Memory Heap Utilization
  • Threads Status
  • Garbage Collection Time (ms)
  • Average RPC Wait Time

Alerts

Name

Alert Message

Description

Capacity Remaining

There is little or no space capacity remaining in HDFS.

Gives warning/critical alert if percentage of available space on all HDFS nodes together is less then upper/lower threshold.

Corrupted Blocks

There are corrupted file blocks in HDFS.

Gives critical alert if number of corrupted blocks is more than threshold.

DataNodes Down

A significant number of DataNodes are down in the cluster.

Gives warning/critical alert if percentage of dead HDFS data nodes in cluster is more than lower/upper threshold.

Failed Jobs

MapReduce jobs are failing too frequently.

Gives warning/critical alert if percentage of map-reduce failed jobs is more than lower/upper threshold.

Hive Metastore State

Hive Metastore server is not running.

Gives critical alert if a Hive Metastore service is unavailable.

HiveServer State

HiveServer service is not running.

Gives critical alert if a Hive Server service is unavailable.

  • No labels