You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

To be Reviewed By:

Authors: Dan Smith

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

Geode uses UDP messaging through JGroups for peer-to-peer messaging related to membership. Geode does not actually use jgroups group membership system, just it's reliable UDP messaging. Using UDP messaging through JGroups has a couple of issues:

  • We implemented our own non-standard encryption and key exchange system on top of jgroups UDP messaging in order to ensure that all P2P messages are encrypted. See Secure UDP Communication in Geode. This adds complexity to the configuration by requiring a users to also set a separate UDP encryption property, and it also adds the risk of security and functional issues with our implementation. We recently have been discussing deprecating this property on the mailing list because of it's functional issues - See this thread.
  • JGroups does not support rolling upgrades which makes it more difficult to upgrade JGroups.

Rather than continue with using UDP, we would like to replace the UDP messaging for membership with a TCP-based messaging system. This will allow us to use the standard encryption protocols for all peer to peer messaging, and it will remove our dependence on jgroups in the long term.

Goals

  • Set us up for removing JGroups as dependency in future versions by moving to a protocol that does not require JGroups
  • Support rolling upgrades from the old JGroups protocol to the new TCP-based protocol
  • Use the existing SSL settings to control how membership messages are encrypted
  • Be as reliable in the face of networking failures as the previous protocol

Anti-Goals

  • It is not a goal to allow the user to configure their own messaging system.

Design


All of messaging related to membership is handled by the JGroupsMessenger class, which implements the Messenger interface. We will create a new implementation of Messenger that uses TCP sockets, rather than JGroups and UDP sockets.

Our proposal for the new Messenger is to implement a TCP server and client using Netty. The Netty server and client will use the existing cluster SSL configuration. So if cluster SSL is enabled no additional properties will be required. See https://geode.apache.org/docs/guide/latest/managing/security/implementing_ssl.html for information on the relevant properties.

The host and port of this new TCP server socket need to be shared with other members. Currently, we distribute the jgroups UDP server port as part of the InternalDistributedMember. Those member ids are sent as part of view messages, which allow all members to discover the listening port of other members (see GMSJoinLeave message sequence diagrams). We propose adding a new port field to the InternalDistributedMember to keep track of this new port. (See the Rolling Upgrade section, below.)

The current Messenger interface is somewhat tied to geode specific group messaging. It has the concept of a view, and methods related to returning a QuorumChecker and performing a state flush. In order to separate out group membership from the messaging system, we will create a new module and a new interface for the messenger that is strictly focused on point to point messaging of geode objects - UnicastMessenger. This should also make it easier to swap out different messaging implementations if we want to experiment with them.

In order to support rolling upgrades, we are going to need to continue to run the old Messenger implementation that uses jgroups, until we drop support for old versions. Therefore, we will also need a BackwardsCompatibleMessenger that wraps the new Messenger and the JGroupsMessenger. The class diagram will look something like this:


Messenger "Existing interface, has group messaging capabilities" installView start send(MemberID, Message) addHandler(Class, MessageHandler) UnicastMessenger "Point to point messenger" start(): Address send(Address, Object message) addHandler(Class, MessageHandler) MessengerImpl JGroupsMessenger BackwardsCompatibleMessenger NettyUnicastMessenger

The NettyUnicastMessenger will maintain one connection to each peer. When sending a message the NettyUnicastMessenger will create a connection to all destinations if no connection exists yet. Once a connection is established, the connection will remain open until the messenger is told to shut it down.

Messages dispatched from NettyUnicastMessenger will be dispatched from netty event loop threads. For this reason, it is important that message processing should not block, or it will prevent other messages from being received. The old JGroupsMessenger dispatched messages using a single jgroups dispatcher thread.


Handling TCP connection failures

The contract of the Messenger is that it keeps trying to deliver messages to the destination as long as those destinations are still in the view. Because individual TCP connections can fail, this basically forces us to implement a reliability layer above TCP that will continue to retry messages until a member is removed from the view. This layer needs to be able to:

  • Reestablish a connection to the destination to if the existing TCP connection fails.
  • Retransmit messages that may not have been received. Because more than one message can be in the TCP send buffer of the sender when a TCP connection fails, we need to retransmit some window of messages.
  • Because we are retransmitting messages, we need include some sort of sequence number to prevent duplicate messages, and possibly also to deliver messages in order.

Possible options:

One concern about switching to a TCP-based protocol is that network outages may result in TCP sockets hanging on read or write operations. We need to ensure that, if a connection to one member is blocked, we still send messages to the other members. Each destination will have its own queue of messages to be sent, and adding to one queue should never block.

Changes and Additions to Public Interfaces

If you are proposing to add or modify public interfaces, those changes should be outlined here in detail.

Performance Impact

Do you anticipate the proposed changes to impact performance in any way? Are there plans to measure and/or mitigate the impact?

Backwards Compatibility and Upgrade Path

We need to be able to do a rolling upgrade from the old JGroups-based protocol to the new protocol. We will need to continue to support rolling upgrades for a certain range of versions before we can drop JGroups.

In order to accomplish this a member will actually need to be listening for connections on both protocols when it initially starts up. We will create a delegating Messenger that contains both a JGroupsMessenger and a NettyMessenger. It can install handlers in both of them, and decide which Messenger to use when sending a message based on the version of the recipient. If a member receives a view that contains no old members that don't support the old protocol it could shut down the JGroups-based Messenger.

There is an issue here with the need to listen on two separate ports, because InternalDistributedMember currently only has support for a single port field. There are a few options we are evaluating:

  1. Just use the same port for JGroups and Netty. Since one is a UDP port and the other is TCP port, they can both be open at the same time.
  2. Encode the second port in some other field in InternalDistributedMember, for example by using part of the UUID bytes. This is kind of hacky.
  3. Pass the new membership port around outside of InternalDistributedMember. This would probably involve sending as part of the FindCoordinatorResponse, as well as including it in the NetView.


Prior Art

What would be the alternatives to the proposed solution? What would happen if we don’t solve the problem? Why should this proposal be preferred?

FAQ

Answers to questions you’ve commonly been asked after requesting comments for this proposal.

Errata

What are minor adjustments that had to be made to the proposal since it was approved?



Summary

  • No labels