Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Name

Alert Message

Description

Threshold

Capacity Remaining

There is little or no space capacity remaining in HDFS.

Gives warning/critical alert if percentage of available space on all HDFS nodes together is less then upper/lower threshold.

30 (Warning)
10 (Critical)

Corrupted Blocks

There are corrupted file blocks in HDFS.

Gives critical alert if number of corrupted blocks is more than threshold.

1

DataNodes Down

A significant number of DataNodes are down in the cluster.

Gives warning/critical alert if percentage of dead HDFS data nodes in cluster is more than lower/upper threshold.

Failed Jobs

MapReduce jobs are failing too frequently.

Gives warning/critical alert if percentage of map-reduce failed jobs is more than lower/upper threshold.

Hive Metastore State

Hive Metastore server is not running.

Gives critical alert if a Hive Metastore service is unavailable.

HiveServer State

HiveServer service is not running.

Gives critical alert if a Hive Server service is unavailable.

Invalid TaskTrackers

There are TaskTracker nodes which are in the invalid state.

Gives warning alert if there is at least one graylisted task-tracker. Gives critical alert if there is at least one blacklisted task-tracker.

JobTracker Service State

JobTracker service is not running.

Gives critical alert if a JobTracker service is unavailable.

Memory Heap Usage

JobTracker is working under high memory pressure.

Gives warning/critical alert if percentage of used job-tracker memory heap is more than lower/upper threshold.

Memory Heap Usage

NameNode is working under high memory pressure.

Gives warning/critical alert if percentage of used NameNode memory heap is more than lower/upper threshold.

80 (Warning)
90 (Critical)

NameNode Service State

NameNode service is not running.

Gives critical alert if a NameNode service is unavailable.

Oozie Server Service State

Oozie Server service is not running.

Gives critical alert if a Oozie Server service is unavailable.

Secondary NameNode Service State

Secondary NameNode service is not running.

Gives warning alert if a Secondary NameNode service is unavailable.

TaskTracker Service State

 

Turns TaskTracker service to warning state if the TaskTracker service is unavailable.

TaskTrackers Down

A significant number of TaskTrackers are down in the cluster.

Gives warning/critical alert if percentage of map reduce dead task-trackers is more than lower/upper threshold.

WebHCat Server Service State

WebHCat Server service is not running.

Gives critical alert if a Templeton Server service is unavailable.

Under-Replicated Blocks

Number of under-replicated blocks in the HDFS is too high.

Gives warning/critical alert if percentage of under-replicated blocks is more than lower/upper threshold.

...