...
Section | |||||||
---|---|---|---|---|---|---|---|
|
Section | |||||||
---|---|---|---|---|---|---|---|
|
Section | |||||||
---|---|---|---|---|---|---|---|
|
Section | |||||||
---|---|---|---|---|---|---|---|
|
Section | |||||||
---|---|---|---|---|---|---|---|
|
Section | |||||||
---|---|---|---|---|---|---|---|
|
Alerts
Anchor | ||||
---|---|---|---|---|
|
The following Alerts are configured by Ambari SCOM:
Name | Alert Message | Description | Threshold | ||||||
---|---|---|---|---|---|---|---|---|---|
Capacity Remaining | There is little or no space capacity remaining in HDFS. | Gives warning/critical alert if percentage of available space on all HDFS nodes together is less then upper/lower threshold. | 30-Warning | ||||||
Under-Replicated Blocks | Number of under-replicated blocks in the HDFS is too high. | Gives warning/critical alert if percentage of under-replicated blocks is more than lower/upper threshold. | 1-Warning | ||||||
Corrupted Blocks | There are corrupted file blocks in HDFS. | Gives critical alert if number of corrupted blocks is more than threshold. | 1 | ||||||
DataNodes Down | A significant number of DataNodes are down in the cluster. | Gives warning/critical alert if percentage of dead HDFS data nodes in cluster is more than lower/upper threshold. | 10-Warning | ||||||
Failed Jobs | MapReduce jobs are failing too frequently. | Gives warning/critical alert if percentage of map-reduce failed jobs is more than lower/upper threshold. | |||||||
Hive Metastore State | Hive Metastore server is not running. | Gives critical alert if a Hive Metastore service is unavailable. | |||||||
HiveServer State | HiveServer service is not running. | Gives critical alert if a Hive Server service is unavailable. 10-Warning | |||||||
Invalid TaskTrackers | There are TaskTracker nodes which are in the invalid state. | Gives warning critical alert if there is at least one graylisted blacklisted task-tracker. Gives critical alert if there is at least one blacklisted task-tracker. | JobTracker Service State | JobTracker service is not running. | 1 Gives critical alert if a JobTracker service is unavailable. | ||||
Memory Heap Usage | JobTracker is working under high memory pressure. | Gives warning/critical alert if percentage of used job-tracker memory heap is more than lower/upper threshold. | 80-Warning | ||||||
Memory Heap Usage | NameNode is working under high memory pressure. | Gives warning/critical alert if percentage of used NameNode memory heap is more than lower/upper threshold. | 80-Warning | ||||||
TaskTrackers Down | A significant number of TaskTrackers are down in the cluster. | Gives warning/critical alert if percentage of map reduce dead task-trackers is more than lower/upper threshold. | 10-Warning | ||||||
TaskTracker Service State | TaskTracker component is not running. | Turns TaskTracker service to warning state if the TaskTracker service is unavailable. | N/A | ||||||
NameNode Service State | NameNode component is not running. | Gives critical alert if a NameNode service is unavailable. | N/A | ||||||
Secondary NameNode Service State | Secondary NameNode component is not running. | Gives warning alert if a Secondary NameNode service is unavailable. | N/A | ||||||
JobTracker Service State | JobTracker component is | NameNode Service State | NameNode service is not running. | Gives critical alert if a | NameNode JobTracker service is unavailable. | N/A | |||
Oozie Server Service State | Oozie Server | service component is not running. | Gives critical alert if a Oozie Server service is unavailable. | Secondary NameNode Service State | Secondary NameNode service N/A | ||||
Hive Metastore State | Hive Metastore component is not running. | Gives | warning critical alert if a | Secondary NameNode service is unavailable. TaskTracker Service State |
| Turns TaskTracker service to warning state if the TaskTracker Hive Metastore service is unavailable. | TaskTrackers Down | A significant number of TaskTrackers are down in the clusterN/A | |
HiveServer State | HiveServer component is not running. | Gives | warning/critical alert if | percentage of map reduce dead task-trackers is more than lower/upper threshold. a Hive Server service is unavailable. | N/A | ||||
WebHCat Server Service State | WebHCat | Server service Server component is not running. | Gives critical alert if a | Templeton WebHCat Server service is unavailable. | N/A |
Viewing
Anchor | ||||
---|---|---|---|---|
|
...