Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
/**
 * Start the server and start listening for messages.
 */
void start();

/**
 * returns the endpoint ID for this member
 */
InternalDistributedMember getMemberID();

/**
 * sends an asynchronous message. Returns destinations that did not receive the message due to no
 * longer being in the view
 */
Set<InternalDistributedMember> send(DistributionMessage m);

/**
 * adds a handler for the given class/interface of messages
 */
void addHandler(Class c, MessageHandler h);

/**
 * called when a new view is installed by Membership
 */
void installView(NetView v);

 

The start method of the messenger is responsible for starting a server that is listening on a socket. The getMemberId() method returns an InternalDistributedMember. Currently the InternalDistributedMember contains the host and port that the Messenger is listening on in it's getHost and getPort methods.

The send method takes a DistributionMessage. DistributionMessage has a getRecipients method, which returns an array of InternalDistributedMember objects. The messenger can send messages to those recipients because they have the host and port information.

The installView method is used by the messenger to determine which members are no longer in the view, and so we should stop trying to transmit messages to them.

Discovery of the InternalDistributedMembers view happens outside the Messenger (in GMSJoinLeave). We won't have to modify that functionality. The new messenger just needs to know how to start a TCP server, generate an InternalDistributedMember that contains the contact information for that server, and use InternalDistributedMember objects from other other members to send messages to them.

 

Implement a TCP Server Using Netty

...

When sending a message the Messenger will create a connection to all destinations if no connection exists yet. Once a connection is established, the connection will remain open as long as that member is still in the view. If IO errors or timeouts happen when writing messages, the Messenger will close the connection, reestablish a new connection, and retry. Messages will continue to be retried until we receive a view indicating that the member is no longer in the view. We will therefore guarantee the order of message delivery.??? What about data that was already in the socket write buffer but never transmitted? We need to make sure we transmit these messages as well. Does this mean we need our own ack protocol outside of TCP? Or do we just a keep a window of messages that will always be retransmitted when a connection fails?

Because we are retrying messages, we can end up delivering messages more than once. The old JGroups protocol ensured only once delivery. ??? Do messaging messages need only once delivery? What if we end up sending regular messages over this channel? If we need only once delivery, we also need to add sequence numbers to messages and drop duplicates on the receiving side.

Handling TCP connection failures

The contract of the Messenger is that it keep trying to deliver messages to the destination as long as those destinations are still in the view. Because individual TCP connections can fail, this basically forces us to implement a reliability layer above TCP that will continue to retry messages until a member is removed from the view. This layer needs to be able to

  • Reestablish a connection to the destination to if the existing tcp connection fails.
  • Retransmit messages that may not have been received. Because more than one message can be in the tcp send buffer of the sender when a tcp connection fails, we need to retransmit some window of messages.
  • Because we are retransmitting messages, we need include some sort of sequence number to prevent duplicate messages, and possibly also to deliver messages in order.

Possible options

One concern about switching to a TCP based protocol is that network outages may result in TCP sockets hanging on read or write operations. We need to ensure that if a connection to one members is blocked, that we still send messages to the other members. Each destination will have its own queue of messages to be sent, and adding to one queue will not should never block the others.

Rolling upgrade concerns

...