...
Name | Alert Message | Description | Threshold | |||
---|---|---|---|---|---|---|
Capacity Remaining | There is little or no space capacity remaining in HDFS. | Gives warning/critical alert if percentage of available space on all HDFS nodes together is less then upper/lower threshold. | 30-Warning | |||
Under-Replicated Blocks | Number of under-replicated blocks in the HDFS is too high. | Gives warning/critical alert if percentage of under-replicated blocks is more than lower/upper threshold. | 1-Warning | |||
Corrupted Blocks | There are corrupted file blocks in HDFS. | Gives critical alert if number of corrupted blocks is more than threshold. | 1 | |||
DataNodes Down | A significant number of DataNodes are down in the cluster. | Gives warning/critical alert if percentage of dead HDFS data nodes in cluster is more than lower/upper threshold. | 10-Warning | |||
Failed Jobs | MapReduce jobs are failing too frequently. | Gives warning/critical alert if percentage of map-reduce failed jobs is more than lower/upper threshold. | 10-Warning | |||
Hive Metastore State | Hive Metastore server is not running. | Gives critical alert if a Hive Metastore service is unavailable. | ||||
HiveServer State | HiveServer service is not running. | Gives critical alert if a Hive Server service is unavailable. | ||||
Invalid TaskTrackers | There are TaskTracker nodes which are in the invalid state. | Gives critical alert if there is at least one blacklisted task-tracker. | 1 JobTracker Service State | |||
Memory Heap Usage | JobTracker service is not running. | Gives critical alert if a JobTracker service is unavailable. | Memory Heap Usage | JobTracker is working under is working under high memory pressure. | Gives warning/critical alert if percentage of used job-tracker memory heap is more than lower/upper threshold. | 80-Warning |
Memory Heap Usage | NameNode is working under high memory pressure. | Gives warning/critical alert if percentage of used NameNode memory heap is more than lower/upper threshold. | 80-Warning | |||
TaskTrackers Down | A significant number of TaskTrackers are down in the cluster | NameNode Service State | NameNode service is not running. | Gives warning/critical alert if a NameNode service is unavailable. percentage of map reduce dead task-trackers is more than lower/upper threshold. | 10-Warning | |
TaskTracker Service State |
| Turns TaskTracker service to warning state if the TaskTracker service is unavailable. | ||||
NameNode Service State | NameNode | Oozie Server Service State | Oozie Server service is not running. | Gives critical alert if a Oozie Server NameNode service is unavailable. | ||
Secondary NameNode Service State | Secondary NameNode service is not running. | Gives warning alert if a Secondary NameNode service is unavailable. | ||||
TaskTracker JobTracker Service State |
| Turns TaskTracker service to warning state if the TaskTracker service is unavailable. | JobTracker service is not running. | Gives critical alert if a JobTracker service is unavailable. | ||
Oozie Server Service State | Oozie Server service is not running. | Gives critical alert if a Oozie Server service is unavailable. | ||||
Hive Metastore State | Hive Metastore server is not running. | Gives critical alert if a Hive Metastore service is unavailable. | ||||
HiveServer State | HiveServer service is not running. | Gives critical alert if a Hive Server service is unavailable. | TaskTrackers Down | A significant number of TaskTrackers are down in the cluster. | Gives warning/critical alert if percentage of map reduce dead task-trackers is more than lower/upper threshold. | 10-Warning |
WebHCat Server Service State | WebHCat Server service is not running. | Gives critical alert if a Templeton WebHCat Server service is unavailable. |
...