Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

void start(CancelCriterion c) - called after all services have been initialized with init() and all services are available via Services

void started() - called after all servers have been started

void stop()

void stopped() - called after all services have been stopped

void installView(NetView v)

...

The implementation of each of the other components will be in separate packages to keep the code clean and possibly allow for different implementations to be plugged in.

The Authenticator implementation will use Geode's authentication API to authenticate another member and to get credentials for JoinLeave to use in sending membership views and join requests.

The HealthMonitor implementation will initially use the NetView to form a look-to-the-right ring for one member to monitor another.  HealthMonitor will keep a record of the last time a message was received from each member in the system (note - this must be done without clock probes, possibly following the pattern in EventTracker).  If the member it is watching has not made contact in the last member-timeout milliseconds it will request a heartbeat from the member and perform a timed attempt to connect to the members DirectChannel port (if available) and request a health response.  If the member does not respond within member-timeout milliseconds HealthMonitor will remove it using the JoinLeave.removeMember() API.  The implementation of removeMember will forward the request to the current membership coordinator who will perform its own health-check on the member before removing it (sending out a new NetView).  When the ping request has been sent HealthMonitor will go on to examine the next member in the view.

TCPConduit will be modified to check for a health request and respond with its membership ID.  The HealthMonitor will use this to ensure that the port hasn't been reused by another process.

The JoinLeave implementation will use Messenger, and possibly the membership manager, to communicate with other members.  It will use TcpClient to contact Locators when joining in order to find the current membership coordinator.  Once it knows the coordinator it will send it a Join message including authentication credentials.  JoinLeave will also implement membership coordination functions (i.e., replace what we're doing with JGroups GMS).  It will be responsible for detecting a network partition and invoking forceDisconnect() in the membership manager.

The Locator component will persist the current membership view and will respond to requests for the ID of the current membership coordinator.  If there is no membership coordinator (meaning the Locator is booting up) then it will return its best guess of who the coordinator is based on who has contacted it.  The name of the locator's state file will be changed to membershipView.dat

The Manager API is what should be used by all components to interact with the membership manager.

The Messenger component will use a trimmed-down modern JGroups stack channel to perform UDP messaging.  JGroups will no longer be forked for use in Geode but will be added as a dependency.  Messenger will be responsible for installing the current NetView in its JGroups protocol stack as a native JGroups View so that UDP broadcast works and multicast message garbage-collection can be properly performed.  Note that this switch to using off-the-shelf JGroups means we will start seeing more log messages from JGroups than in the past.

Also note that we may not be able to switch to a newer version of JGroups without risking rolling upgrade support.  If the new version of JGroups is not on-wire compatible with the previous version people will not be able to perform a rolling upgrade.

It will be Messenger's responsibility to install Geode's settings from the DistributionConfig (gemfire.properties) into its JGroups channel.  The protocol stack should look something like this:  <UDP> <BARRIER> <pbcast.NAKACK2> <UNICAST3> <pbcast.STABLE> <MFC> <UFC> <FRAG2>.  Of course there will be lots of settings in each of these protocols to customize the stack.  There is no requirement that the JGroups stack configuration be in an external file.  It can be a string embedded in the Messenger implementation.  XML will need to be used because the JGroups PlainConfigurator still uses a colon as a protocol separator and this is incompatible with IPv6 addresses.

All of the JGroups statistics in DistributionStats need to be removed or replaced with corresponding stats based on the new implementation.

Testing

Since this is implementing an existing interface in Geode there are already a lot of tests that exercise it.  These tests will need some attention if they are referring to any JGroups code.  The use of interfaces in this version of the MembershipManager should allow us to create real unit tests, as opposed to integration tests, for each component to achieve a higher level of code coverage.

...