Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Name

Alert Message

Description

Capacity Remaining

There is little or no space capacity remaining in HDFS.

Gives warning/critical alert if percentage of available space on all HDFS nodes together is less then upper/lower threshold.

Corrupted Blocks

There are corrupted file blocks in HDFS.

Gives critical alert if number of corrupted blocks is more than threshold.

DataNodes Down

A significant number of DataNodes are down in the cluster.

Gives warning/critical alert if percentage of dead HDFS data nodes in cluster is more than lower/upper threshold.

Failed Jobs

MapReduce jobs are failing too frequently.

Gives warning/critical alert if percentage of map-reduce failed jobs is more than lower/upper threshold.

Hive Metastore State

Hive Metastore server is not running.

Gives critical alert if a Hive Metastore service is unavailable.

HiveServer State

HiveServer service is not running.

Gives critical alert if a Hive Server service is unavailable.

Invalid TaskTrackers

There are TaskTracker nodes which are in the invalid state.

Gives warning alert if there is at least one graylisted task-tracker. Gives critical alert if there is at least one blacklisted task-tracker.

JobTracker Service State

JobTracker service is not running.

Gives critical alert if a JobTracker service is unavailable.

Memory Heap Usage

JobTracker is working under high memory pressure.

Gives warning/critical alert if percentage of used job-tracker memory heap is more than lower/upper threshold.

Memory Heap Usage

NameNode is working under high memory pressure.

Gives warning/critical alert if percentage of used NameNode memory heap is more than lower/upper threshold.

NameNode Service State

NameNode service is not running.

Gives critical alert if a NameNode service is unavailable.

Oozie Server Service State

Oozie Server service is not running.

Gives critical alert if a Oozie Server service is unavailable.

Secondary NameNode Service State

Secondary NameNode service is not running.

Gives warning alert if a Secondary NameNode service is unavailable.

TaskTracker Service State

 

Turns TaskTracker service to warning state if the TaskTracker service is unavailable.

TaskTrackers Down

A significant number of TaskTrackers are down in the cluster.

Gives warning/critical alert if percentage of map reduce dead task-trackers is more than lower/upper threshold.

WebHCat Server Service State

WebHCat Server service is not running.

Gives critical alert if a Templeton Server service is unavailable.

Under-Replicated Blocks

Number of under-replicated blocks in the HDFS is too high.

Gives warning/critical alert if percentage of under-replicated blocks is more than lower/upper threshold.

Viewing

...

Anchor
alerts-viewing
alerts-viewing

Section
Column
width375px

Column

The Cluster Diagram view will show when an alert has been raised on an object in the cluster. In the image below this is indicated with a (warning) on the cluster icon.

...

Section
Column
width375px

Column

You can also see the state changes of an object in the Health Explorer by selecting an alert and picking the State Changes tab on the right. This tab shows the time as well as the “from” and “to” state of any state change for the monitor associated with the selected alert. The tab also shows the state of the object that triggered the state change.

Customizing
Anchor
alerts-customize
alerts-customize

Section
Column
width375px

Column

By selecting Overrides you can change the default values of the monitor (Critical Threshold, Warning Threshold, Internal). Check the override box and enter a new value. Then select the destination management pack where the overrides will be stored.

...