Apache OODT CAS Resource Management component (or Resource Manager for short) sends Jobs to resource nodes to be executed and Resource manager should be able to monitor each of these resource nodes of a cluster on demand. The idea is to leverage on Ganglia [1] in monitoring those resource nodes.
Ganglia is a BSD-licensed scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.

System overview 

Ganglia plugin connects to a Ganglia metadata daemon (Gmetad) and pull the XML dump and parses it. This is happened by connecting a socket to the relevant port of the machine running Gmetad and obtain the XML of the last status monitored which consists of aggregated status of all resource nodes of the clusters.
After getting the last status monitored of the grid (set of clusters) Resource nodes’ loads will be calculated by the Ganglia plugin according to the configured load calculator module.
Load will be normalized into the range of 0 and the particular node’s capacity by the load calculator module.

A host’s online status can be determined by the TN and TMAX value. In load calculation modules a particular node is considered offline and is ignored if TN > 4 * TMAX. In that case Resource node’s capacity will be assigned as the load to avoid adding jobs to that node. Also if any errors occurred during getting load of a resource node as a safe protocol that node’s capacity will be assigned as the load.

Ganglia plugin’s skeleton classes diagram

 

Currently only the weighted load average calculation module is available with the plugin.

Configuration 

Followings are the configuration details you need to add before using the plugin to monitor a grid using Ganglia. (resource.properties)

  • This entry configures what is the resource monitor should be used to monitor a grid. Since we are leveraging on Ganglia to monitor this should be GangliaResourceMonitorFactory. (This should be an implementation of ResourceMonitorFactory interface. currently available only ganglia resource monitoring) 
    # resource nodes monitor factory
    org.apache.oodt.cas.resource.monitor.factory = org.apache.oodt.cas.resource.monitor.ganglia.GangliaResourceMonitorFactor
  • This entry configures what is the load calculation module should be used. (This should be an implementation of LoadCalculatorFactory interface) 
    # ganglia resource monitor's load calculator factory
    org.apache.oodt.cas.resource.monitor.loadcalc.factory = org.apache.oodt.cas.resource.monitor.ganglia.loadcalc.WeightedAverageLoadCalcFactory
  • Following configuration is related to the WeightedAverageLoadCalculator. In this load calculator loadone, loadfive and loadfifteen values are taken and weighted loads are normalized between the capacity and zero range. For that user should specify the weights as prefered. 
    # Load calculation weights
    org.apache.oodt.cas.resource.monitor.loadcalc.weight.loadone=1
    org.apache.oodt.cas.resource.monitor.loadcalc.weight.loadfive=5
    org.apache.oodt.cas.resource.monitor.loadcalc.weight.loadfifteen=5
  • Following entry holds the information about the Ganglia metadata daemon that the plugin should be connected to.
    #ganglia meta daemon (gmetad) host details
    org.apache.oodt.cas.resource.monitor.ganglia.gemtad.host.address=localhost
    org.apache.oodt.cas.resource.monitor.ganglia.gemtad.host.port=8651

References

[1] “Ganglia Monitoring System” [Online]. Available: http://ganglia.info/ [Accessed 4 August 2013].

[2] "Monitor that plugs into Ganglia" [Online]. Available: https://issues.apache.org/jira/browse/OODT-219 [Accessed 11 August 2013] 

  • No labels