...
Section | |||||||
---|---|---|---|---|---|---|---|
|
...
Section | |||||||
---|---|---|---|---|---|---|---|
|
Section | |||||||
---|---|---|---|---|---|---|---|
|
Section | |||||||
---|---|---|---|---|---|---|---|
|
Section | |||||||
---|---|---|---|---|---|---|---|
|
Alerts
Anchor | ||||
---|---|---|---|---|
|
The following Alerts are configured by Ambari SCOM:
Name | Alert Message | Description | Threshold | |||||
---|---|---|---|---|---|---|---|---|
Capacity Remaining | There is little or no space capacity remaining in HDFS. | Gives warning/critical alert if percentage of available space on all HDFS nodes together is less then upper/lower threshold. | 30-Warning | |||||
Under-Replicated Blocks | Number of under-replicated blocks in the HDFS is too high. | Gives warning/critical alert if percentage of under-replicated blocks is more than lower/upper threshold. | 1-Warning | |||||
Corrupted Blocks | There are corrupted file blocks in HDFS. | Gives critical alert if number of corrupted blocks is more than threshold. | 1 | |||||
DataNodes Down | A significant number of DataNodes are down in the cluster. | Gives warning/critical alert if percentage of dead HDFS data nodes in cluster is more than lower/upper threshold. | 10-Warning | |||||
Failed Jobs | MapReduce jobs are failing too frequently. | Gives warning/critical alert if percentage of map-reduce failed jobs is more than lower/upper threshold. | 10-Warning | |||||
Hive Metastore State | Hive Metastore server is not running. | Gives critical alert if a Hive Metastore service is unavailable. | ||||||
HiveServer State | HiveServer service is not running. | Gives critical alert if a Hive Server service is unavailable. | ||||||
Invalid TaskTrackers | There are TaskTracker nodes which are in the invalid state. | Gives critical alert if there is at least one blacklisted task-tracker. | 1 JobTracker Service State | |||||
Memory Heap Usage | JobTracker service is not running. | Gives critical alert if a JobTracker service is unavailable. | Memory Heap Usage | JobTracker is working under is working under high memory pressure. | Gives warning/critical alert if percentage of used job-tracker memory heap is more than lower/upper threshold. | 80-Warning | ||
Memory Heap Usage | NameNode is working under high memory pressure. | Gives warning/critical alert if percentage of used NameNode memory heap is more than lower/upper threshold. | 80-Warning | |||||
TaskTrackers Down | A significant number of TaskTrackers are down in the cluster | NameNode Service State | NameNode service is not running. | Gives warning/critical alert if a NameNode service is unavailable. | Oozie Server Service State | Oozie Server service percentage of map reduce dead task-trackers is more than lower/upper threshold. | 10-Warning | |
TaskTracker Service State | TaskTracker component is not running. | Gives critical alert if a Oozie Server Turns TaskTracker service to warning state if the TaskTracker service is unavailable. | N/A | |||||
NameNode Service State | Secondary NameNode service NameNode component is not running. | Gives | warning critical alert if a | Secondary NameNode service is unavailable. | TaskTracker Service State |
| Turns TaskTracker service to warning state if the TaskTracker N/A | |
Secondary NameNode Service State | Secondary NameNode component is not running. | Gives warning alert if a Secondary NameNode service is unavailable. | N/A | |||||
JobTracker Service State | JobTracker component is not running. | Gives critical alert if a JobTracker service is unavailable. | N/A | |||||
Oozie Server Service State | Oozie Server component is not running. | Gives critical alert if a Oozie Server service is unavailable. | N/A | |||||
Hive Metastore State | Hive Metastore component is not running. | Gives critical alert if a Hive Metastore service is unavailable. | TaskTrackers Down | N/A | ||||
HiveServer State | HiveServer component is not running | A significant number of TaskTrackers are down in the cluster. | Gives | warning/critical alert if | percentage of map reduce dead task-trackers is more than lower/upper threshold. 10-Warning | a Hive Server service is unavailable. | N/A | |
WebHCat Server Service State | WebHCat | Server service Server component is not running. | Gives critical alert if a | Templeton WebHCat Server service is unavailable. | N/A |
Viewing
Anchor | ||||
---|---|---|---|---|
|
...