You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »


We must replace the old JGroups build in Geode with something new due to licensing restrictions.  The old JGroups had a LGPL license, which is incompatible with Apache 2.0 licensing.  This document explains the requirements Geode has for a membership system and examines several alternatives for moving forward, concluding that we should replace JGroups with a custom Geode-centric solution.

 

How Geode uses JGroups

Geode’s dynamic membership model is based on the JGroups dynamic model provided by pbcast.GMS, where new nodes can be added at will without taking the distributed system down.  

Geode uses the membership view primarily for replication and event delivery. For replication we form a subset of the membership view that is known to have a cache Region and label this a DistributionAdvisor. Our current replication scheme requires a return-receipt from any specified recipient as long as the recipient is in the membership view.

The membership view is also used for selecting an Elder for the distributed lock service. The Elder keeps track of who is allowed to grant lock requests, and it is the oldest non-admin member in the view.

Network partition detection is also custom built into the JGroups GMS protocol and the failure-detection protocols have been altered to let Geode feed “suspicion” into JGroups to help it figure out that failures have happened.

Peer authentication has been built into the 2.2.9 JGroups stack by introducing a custom AUTH protocol that intercepts Join requests and requires authentication checks to pass before allowing the request to reach the GMS membership protocol.

In the future Geode servers were also going to rely on JGroups for reliable UDP transmission of messages that are broadcast to the whole membership set, such as StartupMessage, ShutdownMessage, CreateRegionMessage and PDX registrations.  Sending these messages over TCP/IP stream connections is a barrier to increasing the size of the distributed system, especially at startup time when we must create 4M of these connections (M=member count) just to join the distributed system.

Reliable UDP communication is also needed for out-of-band low-priority communication, such as sending alerts to management nodes.  Creating TCP/IP connections to send alerts can block operations during periods when there are already bad things happening.  We recently saw this in a large production system, where an alert that members weren’t acknowledging a membership view change blocked operations because the management node that was to receive the alert was sick and not accepting connections.

Geode integrates a JGroups GossipServer into the Locator service.  GossipServer is used to provide information on who is in the distributed system when a new member is joining the distributed system.

Finally, Geode clients use the membership system's classes to form IDs, and these contain a JGroups IpAddress.

 

Requirements

In brief, the membership service must

  1. deliver notification of membership changes to the DistributedSystem’s MembershipManager

  2. allow new members to join without taking down the system

  3. provide identity for each peer in the distributed system and allow clients to have a similar identity.  The identity must be unique for the peer and old identities should not be reused (at least not very quickly)

  4. transmit information about each member’s DistributedSystem characteristics (VM type, DirectChannel port, Groups, Name, etc) in the member’s ID

  5. efficiently and quickly detect loss of a member (failure detection)

  6. support the notion of an Elder member for Geode’s Distributed Lock Service

  7. support Geode’s model of handling network partitions (winning/losing partitions)

  8. allow Geode to give advice on which members might be sick or out of action

  9. support rolling upgrade (old members can’t rejoin once upgrade has begun & the service itself must support backward compatibility)

  10. integrate with Geode’s authentication service and require authentication before allowing a new member to join

A UDP messaging services must

  1. Be compatible with the membership service’s IDs  (an ID from membership identifies endpoings in the UDP messaging service)

  2. Support rolling upgrade (on-wire compatibility across releases)



Options for replacing JGroups v2.2.9


There are a number of options for us to choose from.  Here are a few:

Move to a newer version of JGroups

Use Zookeeper

Use Akka

Create a custom solution


One of the nice things about Geode is that membership management is dynamic and has no single point of failure.  Its weakest link is the Locator services, where if all Locators take a nose-dive clients cannot get info about servers and new servers can't be added to the cluster until one of the locators is available again.  Even if the locators are down the server cluster remains viable and available to clients that are already connected to servers.

For this reason a solution like Zookeeper seems inadequate.  Users would have to configure Zookeeper clusters and make sure the cluster is configured so that servers have minimal risk of losing contact with it.  Losing contact with the cluster would require a server to shut down.  Zookeeper clusters are typically pretty small, so a Geode user with 200 servers might feel at risk when using even a large 7-node Zookeeper cluster.  Zookeeper also doesn't answer requirements 7 through 11 or offer UDP messaging.

JGroups has evolved and solved a lot of problems that had to be fixed in the 2.2.9 copy currently in the Geode repository.  However, in order to use it we would have to fork it and modify certain parts in order to answer requirements 4, 7, 9 and 10.  We could fork only parts, such as GMS and the failure detection protocols but the View class needs to carry Credentials for authentication in order to be useful to Geode.  If not used for cluster membership, JGroups might still be useful for reliable UDP messaging.

Akka looks promising and a lot of people are using it.

A custom solution that does not leverage other projects for clustering could also be implemented, especially if JGroups is used for reliable UDP messaging.

 

  • No labels