Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The membership manager can forcefully shut down a Geode cache if it detects it is no longer a member of the distributed system.

Interfaces

There are a number of existing interfaces in Geode that must be implemented by the membership manager:

...

void start(CancelCriterion c) - called after all services have been initialized with init() and all services are available via Services

void started() - called after all servers have been started

void stop()

void stopped() - called after all services have been stopped

void installView(NetView v)

void beSick(), playDead(), beHealthy() - used for membership testing

void emergencyClose() - shut down threads & other resources like sockets

 

 

ServiceFactory - used by the membership manager to instantiate its services

...

String rejectionMessage authenticate(NetMember InternalDistributedMember m)

Object getCredentials()

...

HealthMonitor - monitors members and instigates removal of those deemed dead

void contactedBy(NetMember InternalDistributedMember m) - tells the monitor that we've had contact with another member

void suspect(NetMember InternalDistributedMember m) - tells the monitor that the member is suspected of being ill or dead

void checkSuspect(NetMember InternalDistributedMember m) - requests a health check on another member.  This should initiate removal of the member if it does not pass the test

...

void leave() - leaves the distributed system.  Should be invoked before stop()

void remove(NetMember InternalDistributedMember m) - force another member out of the system

InternalDistributedMember getMemberID()

 NetView getView()

 

Locator - used by TcpServer to handle peer-location requests.  Implements TcpHandler

...

void forceDisconnect(String reason)

boolean isShunned(DistributedMember mbr)

DistributedMember getLeadMember()

DistributedMember getCoordinator()

 

 

 

MessageHandler - receives messages from a Messenger

...

void send(DistributionMessage m) - sends an asynchronous message

NetMember InternalDistributedMember getMemberID() - returns the endpoint ID for this member

...

Properties getProperties()

 

Implementation Notes

In order to preserve as much of the current membership behavior as possible, fostering adoption of Geode by the GemFire user base the existing JGroupMembershipManager will be copied and most of its code will be preserved.  It will continue to hold the DirectChannel but will now also hold a ServiceConfig that it will use in place of the JGroups channel.

The implementation of each of the other components will be in separate packages to keep the code clean and possibly allow for different implementations to be plugged in.

The Authenticator implementation will use Geode's authentication API to authenticate another member and to get credentials for JoinLeave to use in sending membership views and join requests.

The HealthMonitor implementation will initially use the NetView to form a look-to-the-right ring for one member to monitor another.  HealthMonitor will keep a record of the last time a message was received from each member in the system (note - this must be done without clock probes, possibly following the pattern in EventTracker).  If the member it is watching has not made contact in the last member-timeout milliseconds it will request a heartbeat from the member and perform a timed attempt to connect to the members DirectChannel port (if available) and request a health response.  If the member does not respond within member-timeout milliseconds HealthMonitor will remove it using the JoinLeave.removeMember() API.  The implementation of removeMember will forward the request to the current membership coordinator who will perform its own health-check on the member before removing it (sending out a new NetView).  When the ping request has been sent HealthMonitor will go on to examine the next member in the view.

TCPConduit will be modified to check for a health request and respond with its membership ID.  The HealthMonitor will use this to ensure that the port hasn't been reused by another process.

The JoinLeave implementation will use Messenger, and possibly the membership manager, to communicate with other members.  It will use TcpClient to contact Locators when joining in order to find the current membership coordinator.  Once it knows the coordinator it will send it a Join message including authentication credentials.  JoinLeave will also implement membership coordination functions (i.e., replace what we're doing with JGroups GMS).  It will be responsible for detecting a network partition and invoking forceDisconnect() in the membership manager.

The Locator component will persist the current membership view and will respond to requests for the ID of the current membership coordinator.  If there is no membership coordinator (meaning the Locator is booting up) then it will return its best guess of who the coordinator is based on who has contacted it.  The name of the locator's state file will be changed to membershipView.dat

The Manager API is what should be used by all components to interact with the membership manager.

The Messenger component will use a trimmed-down modern JGroups stack channel to perform UDP messaging.  JGroups will no longer be forked for use in Geode but will be added as a dependency.  Messenger will be responsible for installing the current NetView in its JGroups protocol stack as a native JGroups View so that UDP broadcast works and multicast message garbage-collection can be properly performed.  Note that this switch to using off-the-shelf JGroups means we will start seeing more log messages from JGroups than in the past.

Also note that we may not be able to switch to a newer version of JGroups without risking rolling upgrade support.  If the new version of JGroups is not on-wire compatible with the previous version people will not be able to perform a rolling upgrade.

It will be Messenger's responsibility to install Geode's settings from the DistributionConfig (gemfire.properties) into its JGroups channel.  The protocol stack should look something like this:  <UDP> <BARRIER> <pbcast.NAKACK2> <UNICAST3> <pbcast.STABLE> <MFC> <UFC> <FRAG2>.  Of course there will be lots of settings in each of these protocols to customize the stack.  There is no requirement that the JGroups stack configuration be in an external file.  It can be a string embedded in the Messenger implementation.  XML will need to be used because the JGroups PlainConfigurator still uses a colon as a protocol separator and this is incompatible with IPv6 addresses.

All of the JGroups statistics in DistributionStats need to be removed or replaced with corresponding stats based on the new implementation.

Testing

Since this is implementing an existing interface in Geode there are already a lot of tests that exercise it.  These tests will need some attention if they are referring to any JGroups code.  The use of interfaces in this version of the MembershipManager should allow us to create real unit tests, as opposed to integration tests, for each component to achieve a higher level of code coverage.

...