1 Introduction

This document defines the latest Management and Monitoring requirements of GemFire while also exposing known problems and shortcomings of the existing GemFire Admin API and GemFire JMX Agent.

2 Related Documents

  • GemFire Enterprise Developer's Guide, Chapter 19: Developing System Administration Tools
  • GemFire Enterprise System Administrator's Guide, Chapter 9: Using JMX to Administer GemFire
  • Improvements to GemFire Management in Brandywine (4.0?) 
    (https://svn.gemstone.com/repos/gemfireSpecs/admin/Brandywine.txt)

3 Terminology

  • JMX – Java Management Extensions: a standard API introduced in J2SE 5.0 for management and monitoring; primarily specified by JSR 3 and JSR 160
  • JMX Agent – refers to the MBeanServer and all server-related features, including adaptors and connectors for enabling JMX clients to connect and access MBeans hosted by the MBeanServer
  • JMX Client – refers to any code which connects to a JMX Agent (locally or remotely) for access to MBeans in the JMX Agent's MBeanServer
  • MBean – a named management object representing a resource and providing a management interface which exposes attributes, operations, and notifications (events).
  • MBean Proxy – a convenience proxy implementing a standard Java interface which forwards all calls through the MBeanServer to the MBean, allowing for more simple client code to be written
  • MBean Server – a repository of MBeans and primary API for manipulating MBeans
  • Model MBean – an MBean whose attributes and operations are specified at runtime using descriptor-based metadata
  • MXBean – a variant of Standard MBean where complex types are mapped to a standard set of types defined in the javax.management.openmbean package (known as open-types); JDK 1.6 allows for user-defined MXBeans whereas JDK 1.5 only allows for Standard MBeans that closely adhere to the same standards as MXBeans
  • Notification – an event that is emitted by an MBean
  • Platform MBeanServer – the default MBeanServer of each JVM which hosts the Platform MXBeans
  • Standard MBean – an MBean whose attributes and operations are deduced from a Java interface using certain naming patterns, similar to those used by JavaBeans

4 Requirements

This section outlines all requirements that aren't currently provided by the existing GemFire Admin API and JMX Agent. In addition, GemFire Management and Monitoring services should provide all functionality currently provided by the existing management tools and APIs (minus any problems or defects identified in the current offerings). The focus here is on what changes are needed and what requirements are not currently being met in GemFire.

4.1 Management model should parallel the current GemFire model.

The old API model uses AdminDistributedSystem which has SystemMembers which may have a SystemMemberCache with one or more SystemMemberRegions. The new approach should emphasize system aggregates while still allowing users to drill-down to member-specific metrics and information. This supports the preferred high-level direction for the GemFire product as well as the reported preference of customers and also the default views that one sees in enterprise-class management and monitoring tools.

4.2 Present the customer with just one well-polished, fully featured API.

This API should follow industry standards and be strictly JMX-based. No proprietary Admin API is needed or desirable. Having more than one API complicates the product, increases the workload on engineering, and confuses customers.

4.3 Management API should cover all GemFire features.

The Management API should also continue to evolve with the rest of the GemFire product. New features should require specification of what new stats, operations, MBeans or other changes need to be made to the Management API.

4.4 JMX Agent needs to be hostable from within any GemFire member or dedicated external JVM.

For example, this allows the JMX Agent to be hosted within a GemFire CacheServer or hosted in a non-GemFire dedicated process which doesn't utilize a GemFire license or use other GemFire APIs. This latter statement also implies that management and monitoring should not use existing GemFire connections/messaging for transport and communication if possible.

4.5 Management API needs to be accessible to JMX client code within any JVM.

The JMX client code should be useable from within any GemFire member or any non-member JVM. Simply stated, a JMX client can access JMX services locally or remotely. The results of using it locally or remotely should be consistent as expected by customers.

4.6 The management service needs to be GemFire-version agnostic.

A JMX client needs to manage members running different versions of GemFire, and the JMX Agent needs to host MBeans for members running different versions of GemFire. This allows the management service to fully monitor a GemFire system as members are taken offline and brought back online during an upgrade to a new version of GemFire.

4.6.1 This implies that management and monitoring cannot use existing GemFire connections/messaging for transport and communication.

4.6.2 It is probably acceptable to begin by supporting the latest release of GemFire and then be version agnostic thereafter.

In other words, this support would not cover previous versions of GemFire before adding this feature, but would allow managing future versions.

4.7 Communication transport layer should not use GemFire.

Several requirements already imply this requirement. However, we need to also explicitly state this because it has previously come up as a specific customer requirement. It also has a couple of further benefits:

  1. It helps facilitate managing and monitoring of a GemFire system that is in trouble. 

  2. It imposes less overhead on GemFire resources for managing or monitoring.

4.7.1 This removes dependency on GemFire and allows monitoring to work even in the face of an internal failure in GemFire. GemFire and management/monitoring will still both be vulnerable to networking or other external failures.

4.7.2 A common protocol for JMX-based management, such as RMI, should be used. 
RMI is ideal as it is the preferred and most feature-rich protocol defined for JMX. It also supports use of SSL for security, and ports can be specified to allow for access through a Firewall.

4.7.3 Overhead in manageable GemFire members should be much less than the overhead accrued by a member that actually hosts a managing JMX Agent.

Any design should seek to minimize this overhead. It may be acceptable to require the presence of the PlatformMBeanServer which is present by default in 1.5/1.6 JVMs, and its presence is actually expected by customers. Java 1.5/1.6 provides an RMI Connector, so using that might be a good option for communication between the managing JMX Agent and each manageable GemFire member.

4.8 Organization and presentation of the GemFire MBeans should lend well to using JConsole.

This same requirement should extend to other tools which support JMX as well. Other tools include Hyperic, JMX RI HttpAdaptorServer and any other JMX-based management tools, but the emphasis will be a clean presentation within JConsole since it will be the only out-of-box tool that customers will typically have access to.

4.8.1 GemFire MBeans must be well-organized, intuitive, well-documented and easy to navigate and use.

4.8.2 Open MBean open-types should be used wherever complex types are required.

4.8.3 A JConsole Plugin should be considered if a further level of front-end support is deemed necessary to ship with GemFire.

Java 1.6 introduces the JConsole API which supports JConsole Plugins that appear as new tabs in JConsole (http://download.oracle.com/javase/6/docs/jdk/api/jconsole/spec/index.html). GFmon or a subset of its features could be rewritten as a JConsole Plugin if it adds sufficient value to GemFire. Shipping Hyperic with GemFire may be more desirable but it's very unlikely we could do this.

4.9 Use only standard JMX classes, tools, and concepts from Java 1.5/1.6.

4.9.1 All dependencies on MX4J jars and classes should be removed.

This eliminates all possible ambiguity in running with multiple implementations. It reduces complexity and ensures that our JMX support matches what is already documented by Sun for Java 1.5/1.6.

4.9.2 Java 1.5/1.6 includes an implementation of the RMIConnector.

4.9.3 Sun's JMX RI is a separate download which includes HttpAdaptorServer (http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/download.jsp). We should consider supporting this as the industry considers this to be standard JMX technology.

4.9.3.1 Sun's JMX RI is GPL/CDDL so we cannot ship it with GemFire. The user can however, easily download it and add it to their classpath with "-cp jmxri.jar;jmxtools.jar" and follow Sun's documentation.

4.10 Security should be handled and supported as documented for Java 1.5/1.6 JMX.

Java 1.5/1.6 JMX requires password and access files by default. Documentation encourages the user to take further steps to tighten down security using SSL.

4.10.1 Optional SSL support is configurable for RMI Connector in both Java 1.5 and 1.6.

4.10.2 Optional SSL support is configurable for RMI Registry only in Java 1.6 (not in 1.5 unless we can find a way to code around this limitation).

4.11 Management and monitoring must be accessible through Firewalls.

See http://blogs.sun.com/jmxetc/entry/java_5_premain_rmi_connectors for detailed discussion on why the following sub-requirement(s) are necessary.

4.11.1 GemFire must provide the means to configure the ports for both RMI Registry and RMI Connector.

The jmxremote.port property is for the RMI Registry; out-of-box Java 1.5/1.6 JMX does not allow specification of the RMI Connector port.

GemFire will need to programmatically emulate out-of-box Java 1.5/1.6 JMX and create a JMXServiceURL with these two ports (RMI Registry and RMI Connector). Configuration options need to allow specifying these ports when launching a process that will host the GemFire JMX Agent. Other enterprise-class software products also do this to ensure Firewall accessibility.

4.12 Provide framework to allow custom MBeans to be hosted in JMX Agent.

This would allow application-specific MBeans written by customer to be hosted in the JMX Agent. Customer MBeans may be specific to their application or might simply be used to monitor and aggregate GemFire MBeans in custom ways. This provides a powerful mechanism for customers or field engineers or tool vendors to customize and extend GemFire Management and Monitoring.

4.13 Provide mechanism to specify which GemFire MBeans are automatically registered.

This gives the user better control over which MBeans will be hosted in the JMX Agent, allowing users or tools to focus on what's important and not be distracted by MBeans that might be less important in certain use-cases. For example, a user might only want the system-level aggregate MBeans to be registered.

4.13.1 This could be accomplished with a list of regex patterns for MBean object names, a list of explicit MBean types, or other similar approach.

4.14 Provide option to remotely expose Platform MXBeans from each member.

The Platform MXBeans are widely used and expected by customers. Although similar metrics can be and likely will be exposed via GemFire MBeans, it is highly recommended that we provide the option to expose each member's MXBeans remotely for visibility from a centralized GemFire MBeanServer. This should be a configurable option, allowing the customer to specify exactly which MXBeans should be exposed, if any. This probably requires registering a new MBean that represents the remote MXBean and is identified by a new, unique ObjectName by including information such as the GemFire Member Id.

4.14.1 This requires that each managed member hosts its own PlatformMBeanServer which is already the default for Java 1.5/1.6.

4.15 Offline members should be represented by MBeans in a persistent way.

This allows a management tool to present an entire GemFire Cache even when one or all of its members are currently offline. This requirement needs further research and should be a configurable option.

4.16 Follow the latest industry-accepted JMX Best Practices.

In particular, see Sun's documentation including http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/best-practices.jsp.

4.16.1 Use identifying domain: GemFire

4.16.2 Use standard object names: type=<type>,name=<name>,member=<member>,etc

4.16.2.1 Use hierarchical "type=" values which capture model containment relationships. For example, a GemFire Cache has Regions, for which each Member might host data for. Instead of using "type=Region,subType=Member", use "type=Cache.Region.Member" or something similar.
4.16.2.2 Use Standard MBean interfaces. This allows for much better javadoc documentation and allows for creation of mbean proxies for client code to easily interact with.

In this context, a proxy is a well-typed accessor for the MBean as described in the javadocs for javax.management.MBeanServerInvocationHandler (http://download.oracle.com/javase/1.5.0/docs/api/javax/management/MBeanServerInvocationHandler.html) and javax.management.JMX (http://download.oracle.com/javase/6/docs/api/javax/management/JMX.html). This provides a much cleaner and easier-to-use client API for the customer, allowing them to avoid the more difficult, error-prone approach of making abstract invocation calls made against the MBeanServer.

4.16.2.3 Declare all methods to throw java.io.IOException.

This is recommended in Sun's JMX Best Practices to avoid the customer seeing UndeclaredThrowableException.

4.16.2.4 Use simple standard data-types or use openmbean open-types where more complex types are required.

This is the best approach to mimic MXBeans. Java 1.6 makes the further recommendation to actually use custom-defined MXBeans, but Java 1.5 does not support this. So we should lean in this direction as strongly as possible while still having to support Java 1.5 and allow ourselves to embrace it down the road.

4.16.2.4.1 Open MBeans allow richer open-types but we should avoid this if possible

Sun bug 5045358 says that Open MBeans don't allow primitive types or arrays thereof. The recommendation is to use StandardMBeans with open-types only where complex data-types are required.

4.16.2.5 Notification source should be ObjectName of MBean sending Notification.
4.16.2.6 All data available via Notification should also be available via MBeans in case Notifications are missed.

For example, if there's a Notification whenever a member departs, then there should also be an MBean that provides the count of MembersDeparted as well as current membership info.

4.16.2.7 Use the PlatformMBeanServer instead of creating a new MBeanServer instance.
4.16.2.8 Use MXBeans as our MBean type or a close approximation under Java 1 5.

Just reiterating this requirement as it is stated and implied several places in Java 1.6 JMX documentation

5 Limitations and Problems with Existing Admin API and JMX Agent

This section discusses limitations and problems with the existing Admin API and JMX Agent. It is an implicit requirement that any new Management & Monitoring solution should overcome and avoid these problems while also meeting the requirements outlined in the previous section.

5.1 Admin API is wrapped by JMX layer using complex Model MBeans.

The layered nature of having the Admin API plus JMX model mbeans delegating to it is overly complex, making it difficult for other engineers to build upon as new features are added to GemFire. This pattern was introduced as a form of reuse in order to support the previous requirement for having a non-JMX Admin API as well as a JMX version of the same API.

The use of model mbeans as defined in the mbeans-descriptors.xml and subsequent adding of attributes and stats discovered at runtime results in MBeans that are poorly documented in javadocs.

Our use of the model mbean for attributes discovered at runtime is more complicated than the Model MBean implementation that ships with Java 1.5/1.6 can handle, so we are currently tied to an implementation that's originally from MX4J. The explanation is too long and technical for this document, but it essentially involves various implementations interpreting the original JMX specification differently, and we coded our usage of JMX to MX4J before Java 1.5 started shipping.

5.2 Object model is named inconsistently from the rest of GemFire.

The Admin API was purposely designed with class names that are different from the GemFire product's object model to avoid confusion (ie, AdminDistributedSystem vs DistributedSystem; SystemMember vs DistributedMember; CacheVm vs CacheServer). This looks unprofessional and very inconsistent. The end result just causes confusion afterall.

Additionally, the GemFire Caching API has had a facelift and the way we want customers to look at GemFire is dramatically different than how the Admin API currently represents a GemFire system.

5.3 Majority of new GemFire features are not supported or represented.

The existing Admin API reflects GemFire 3.5 more than it does GemFire 6.5 and now lacks nearly 50% of the product's features. Many of the GemFire features that were supported in 3.5 have now been removed from GemFire (GFX, SharedMemoryManagers, etc). So the Admin API has actually shrunken instead of growing with the product, and some of the object-modeling decisions were originally made because of features that are now absent.

5.4 Usage is difficult and valuable metrics are hidden.

Even JMX-savvy customers/users have trouble with our JMX API because it is member-centric and requires deep product knowledge to dig through the MBean operations to eventually get to everything including certain stats. The engineers who wrote the Hyperic GemFire Plugin were unable to find the stats for Cache hits/misses which are in fact available via JMX. The MBean model is too hierarchical, requiring manual invocation of other MBean operations to instantiate and register child MBeans, including those with important stats.

5.5 We use MX4J classes even within Java 1.5/1.6 JVMs.

Our RMI connection and Http Adaptor appear unusual to customers. The RMI connection string was standard for MX4J in early 2000s but is non-standard for Java 1.5/1.6 JMX. Our use of MX4J itself is non-standard now that the JVM ships with built-in JMX support.

In addition, many customers are now familiar with the JMX RI HttpAdaptorServer which is different from the MX4J Http Adaptor that we ship.

We use the MX4J classes within 1.5/1.6 JVMs for the HttpAdaptor, RMI Naming Registry, and our RequiredModelMBean. While it would be fairly straight forward to remove/replace HttpAdaptor and RMI Naming Registry classes from MX4J, our implementation is too dependent on the MX4J implementation of RequiredModelMBean to remove it without an extensive redesign (see 5.1 above).

5.6 Runtime discovered attributes are poorly documented and too numerous.

Because we took the shortcut of exposing many attributes and stats as "discovered at runtime" we don't need to alter the Admin/JMX API as we add stats and region attributes. The downside is that customers are not interested in the vast majority of the stats we expose and it would be better to hand-pick what we want to expose. Having to specify what stats, gemfire properties, and region attributes to expose via Management API would require a little more work when new ones are added, but I believe the end result would be more useable and avoid drowning the user in too much noise. It would provide better documentation and make it easier for simple tools to know about these types of attributes. And finally this would be easier for engineers to understand and also get them in the habit of digging into the Management API.

5.7 Only management and monitoring of remote GemFire members is supported.

The Admin API requirement was simple at first: expose the guts of the GF Console as a public API. The internal code used by both the Admin API and the original GF Console is only for administering/monitoring remote GF members, so the existing Admin API and JMX Agent can only manage and monitor remote members (not a local member that might host them). Therefore we currently have the restriction that the JMX Agent must be in its own JVM which is undesirable, plus colocated use of Admin API confuses customers because the local member is invisible to the local usage of the Admin API.

5.8 MBeans representing member resources have lifecycle tied to member itself.

The current JMX implementation adds/removes MBeans as members join/depart. This causes MBeans to disappear and prevents tools from properly showing an offline GemFire system. Current customers typically have known GemFire members which are simply offline and thus we should have the option to keep an MBean around to represent that member.

Future tools are probably going to depend on this.

5.9 StatAlerts and EmailNotifications are incomplete and redundant in JMX.

The newer features of StatAlerts and EmailNotifications were never finished or integrated cleanly. They lack representation in the public APIs and language localization. They also use configuration that is inconsistent with the rest of the Admin/JMX API. Their existence also overlaps with other JMX features as well as some of the responsibility of external tools (such as Hyperic or possibly GFMon depending on its future). These could be moved into GFMon by implementing them in a different way (it's easy to monitor an MBean attribute which is how we expose stats) or removed altogether.

5.10 Object model is tied to the older way of viewing and using GemFire.

GemFire has a fresh new direction with an emphasis on the (aggregated) Cache rather than as a DistributedSystem, including API changes to support this. The Admin/JMX API models GemFire in an older, more system and member centric way. A better approach would be for the Management API to follow in the footsteps of the main product APIs.

Also, evidence indicates that tools and customers would prefer to view GemFire from top-down, emphasizing system-wide rollup over member-centric monitoring with the ability to drill-down deeper into specific members after starting from a system-wide view of GemFire.

5.11 Existing instrumentation breaks many of the current JMX best practices.

Currently accepted JMX Best Practices dictate using standard MBeans or MXBeans. This and other deviations from current best practices become evident as one explores our MBeans.

5.12 MBean object model presents poorly in JConsole.

Newer tools including JConsole are built with the expectation of certain best practices that didn't exist when the existing JMX Agent was written. The object names we use don't lend well towards being well organized in JConsole's tree view. Our MBean object model also doesn't lend well towards use via JConsole.

5.13 GemFire itself is used as the discovery and communication layer.

This means that management and monitoring will also go down or experience failure if the GemFire system itself fails. It also means that using the Admin API or JMX Agent introduces a system-wide overhead which uses additional GemFire resources and could potentially impact proper functioning or performance of GemFire processes.

5.14 Customers and others are unhappy with our Admin API and JMX support.

Feedback from customers and others indicates that they are very unhappy with our Admin API and JMX Agent. Many of the reasons are already listed above, but complexity and inconsistency are the primary reasons as well as the absence of support for newer GemFire features.

Customers don't want a non-JMX Admin API. They want tools (JConsole and Hyperic for example) that can be easily configured to connect to and monitor GemFire. They want an easy to use JMX Management API which is simple, polished and represents GemFire as they use it.

Customers also want to manage GemFire from an aggregated top-down view, seeing it as a distributed Cache that they can monitor and then be able to drill-down into specific Members to get additional detail only when needed.