You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

IDIEP-67
Author
Sponsor
Created

  

Status

ACTIVE


Motivation

Changes proposed in IEP-61 allows us to lower requirements for consistency guarantees of membership/discovery protocols. Current Discovery SPI implementations are not scalable enough or rely on external distributed systems (ZooKeeper), so relaxing consistency guarantees and improving scalability is a beneficial goal.

Other goals are:

  • Ports unification. Instead of two separate ports for Discovery and Communication subsystems single port should be used for all network interactions. It simplifies overall configuration and eliminates situations of undefined behavior when only one port is available (for discovery or communication) so Ignite node cannot work properly.
  • Code base unification. Instead of relying on custom protocols and network code base well known protocols and libraries will be used: SWIM or Rapid-based discovery and membership module and communication part developed on top of netty library.
  • API simplification. Bringing all network-related APIs into a single module makes it easier to developers to navigate among network capabilities and find necessary API methods.

Description

Terminology

  • endpoint - host(IP)/port which should be configured in discovery configuration. This is the analog 'node' but 'node' is not an appropriate word on such low-level and it represents only host/port rather than the whole node.
  • instanceId/setupId - unique identification of every endpoint which should be bounded with a local process(this identification should be generated on every restart). Something like current nodeId but belongs to network endpoint rather than the whole node.
  • network member/member - connection + instanceId + messages? It can be used for sending messages. It represents a cluster participant which is able to send or receive some messages.

Responsibility

  • Discover and automatically gather all found network endpoints to group membership.
  • Detect the network events - new endpoints appears, the connection fails, etc.
  • Provide cluster membership changes events for subsystems that do not require strict consistency guarantees.
  • Retry to establishing connection if it was lost.
  • Send and receive messages(including custom messages) to any member of the group.
  • Serialize/deserialize messages.
  • Provide API for sending messages to a specific member with/without a delivery guarantee.

Lifecycle

  • Start the process.
  • Bind to the configured local port
  • Iterate over configured ip/port(IP finder?) and connect to endpoint if possible
  • Start the handshake with a successfully connected endpoint for exchange the initial information(instanceId, recoveryMessageId)
  • Notify subscribers about establishing connection to the cluster(Generate network started event)
  • Reconnect to endpoint if connection unexpectedly fails (without any events to subscribers)
  • Notify subscribers about the appearance of a new network member if it finds one.
  • Notify subscribers about the disappearance of network member if connection unrecoverably fails.
  • Stop the process.

API

Draft API
public interface NetworkService {
    static NetworkService create(NetworkConfiguration cfg);

    void shutdown() throws ???;

    NetworkMember localMember();
    
	Collection<NetworkMember> remoteMembers();
    
	void weakSend(NetworkMember member, Message msg);

    Future<?> guaranteedSend(NetworkMember member, Message msg);
    
	void listenMembers(MembershipListener lsnr);
    
	void listenMessages(Consumer<MessagingEvent> lsnr);
}

public interface MembershipListener {
    void onAppeared(NetworkMember member);
    void onDisappeared(NetworkMember member);
    void onAcceptedByGroup(List<NetworkMember> remoteMembers);
}

public interface NetworkMember {
    UUID id();
}


Implementation

  • It needs to implement the group membership over netty.
  • Integrate direct marshaling to netty(via netty handlers).
  • Implement sending idempotent messages with very weak guarantees:
    • no delivery guarantees required;
    • multiple copies of the same message might be sent;
    • no need to have any kind of acknowledgement;
  • Implement sending messages with delivery guarantees::
    • message must be sent exactly once with an acknowledgement that it has actually been received (not necessarily processed);
    • messages must be received in the same order they were sent.
    • These types of messages might utilize current recovery protocol with acks every 32 (or so) messages. This setting must be flexible enough so that we won't get OOM in big topologies.
  • Add ssl support to this module
  • Implement connection retrying algorithms.


Dependencies

Nettyis an asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients. It will be used for interaction with network(read/write, serialize/deserialize).

Scalecube - is a lightweight decentralized cluster membership, failure detection, and gossip protocol library. It will be used for gathering open network interfaces to cluster membership via gossip protocol.


Risks and Assumptions


Discussion Links

// N/A

Reference Links

// N/A

Tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels