IDIEP-15
Author

@Semen Boikov

Sergey Puchnin

SponsorYakov Zhdanov
Created 
StatusDRAFT


Motivation

The current implementation of Discovery SPI implies association of the cluster nodes in the topology of type "ring".

This decision, along with a host of advantages, has a fundamental drawback - the passage of messages is directly proportional to the number of nodes in the cluster.

Consequently, handling of events: join node, node leaving, detection of an occurrence of split-brain situations depends on cluster size. That is, this implementation is not optimal for large configurations.

 

Any implementation of Discovery SPI subsystem must provide:

  • a connection of a new node in the topology

  • node Disconnecting from the topology

  • of the order of grease retention, in which nodes are connected to the cluster and disconnect from it

  • perhaps through a subsystem send custom messages

  • Ability to save node attributes

  • Ability to establish authenticator to be validated connected nodes

Description

Glossary

Znode - Any node or ephemeral node.

Node - A node in a hierarchical structure that stores user data in the ZooKeeper cluster. A node exists until explicitly deleted.

Ephemeral node - Similarly, Node, but the node exists until explicitly deleted or until essentially created his session.

Watches - Customers sign up to receive notifications of changes to the Node.

ZooKeeper server - node is running Zookeeper

ZooKeeper client - host running the server node Apache Ignite

service / ZooKeeper cluster ZooKeeper- the union of server node ZooKeeper, providing service ZooKeeper


/discoveryEvents - znode storage discovery stage events in the cluster Apache Ignite

/alive - znode storage a list of all nodes in the topology

/joinData - znode storage Apache Ignite server node data at the time of connection

/customEvents - znode to store user messages

 

Description implementation

In order to eliminate these drawbacks is proposed implementation using ZooKeeper-a service. As a point of connection and synchronization for all server nodes of Apache Ignite cluster is a ZooKeeper .

Thus ZooKeeper cluster is to guarantee the storage of the current topology, attribute nodes, the next connection nodes, queues for storing user and service events.

Functional ZooKeeper

ZooKeeper main functionality is to provide for the allocation process through a shared hierarchy, service coordination, and synchronization.


At the same time, it provided a guarantee that:

  • All the changes will apply from users while maintaining the order in which they were received by the cluster.

  • Actions client nodes zooKeeper performs transaction, ie, or all will succeed, or all will be lost.

  • All customers Zookeeper "see" the same state of the data in the service, regardless of to what exactly the server they are connected to ZooKeeper.

  • After receiving confirmation that the data have been obtained, they will be present in service until removal.


To provide these guarantees all ZooKeeper servers providing service, combined in a cluster and synchronize the current status of connected clients and their transactions using Zab  protocol (https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+in+words). As long as the majority of ZooKeeper cluster servers "see each other" service is available (which is why in the cluster ZooKeeper must be an odd number of servers). ZooKeeper clients connect to an arbitrary server ZooKeeper on a TCP connection, send requests, receive responses, and notification information that the service is functioning correctly. If the connection is dropped, the client ZooKeeper node can reconnect to an arbitrary server ZooKeeper.


Another important functionality ZooKeeper-and technology is customer notifications on changes. ZooKeeper can store arbitrary, customer information directly to the service. After saving the recorded information can be available to all customers ZooKeeper. For this ZooKeeper customers can create any tree structure (similar to the structure of directories and files in the file system), each node which is ZooKeeper znode.Each znode allows you to store an arbitrary user information. Znode divided into node and ephemeral node.If the node was created as a node,it will exist until explicitly deleted. If the node was created as an ephemeral node it will exist until explicitly deleted or until long as there has created its client session.


Any client ZooKeeper may be asked to send a one-time or recurring notification when updates to certain znode.For example, when the changed data associated with this znode or znode ceased to exist, or at a given znode created new ( "daughter") znode.  

Features of realization

Zookeeper guarantees the consistency in message handling, but for Apache Ignite is necessary to guarantee the preservation of the sequence processing of events, and at their own level. To do this in Apache Ignite cluster remains the coordinator, whose role performs the very first node (from active now) are connected to the service ZooKeeper.


Connection node to the cluster Apache Ignite

  1. The configuration of each server node Apache Ignite specified range of addresses and ports for connection to ZooKeeper cluster.

  2. Each server Apache Ignite node is connected to an arbitrary server ZooKeeper.

    1. Creates a new entry in the / joinData data connection.

    2. Recorded in the ephemeral znode / alive - the list of all active nodes in the topology.

    3. It checks whether it is a focal point.

    4. Subscribes to notifications on changes in / discoveryEvents

    5. subscribes to receive notification of a registered output of the previous node from /alive

  3. coordinator, having received notification that a / alive, a new node

    1. processes data from / joinData, checking that the node may be included in the topology.

    2. Creates event nodeJoin and writes it to / discoveryEvents

  4. nodes topology, received notice that / discoveryEvents, a new message is processed topology change.

Disabling a node from the cluster Apache Ignite

  1. Coordinator, received notice that from / alive missing node creates event NodeFailEvent and writes it to / discoveryEvents

  2. nodes topology, received notice that / discoveryEvents, a new message is processed topology change.

  3. Node, connected for the disabled, subscribe to receive notifications of withdrawal of the previous registered site / alive.

Selection Apache Ignite coordinator

  1. When disconnecting the coordinator, the node connected to the second cluster, is notified of the registered output of the previous node from / alive

  2. becomes the coordinator, changing topology version

  3. Processes unprocessed messages / discoveryEvents, to synchronize the global and local topology.

  4. Processes the raw messages / customEvents

  5. sign up to receive notifications of changes to the / alive

  6. sign up to receive notifications of changes in / customEvents

Custom Posts

Discovery the SPI interface allows the user to transmit messages through Discovery subsystem. In the current implementation for the transmission of user messages used znode / customEvent. The basic algorithm implementation is as follows.

  1. Node initiating customEvent sending messages, publish it in znode / customEvent.

  2. Coordinator receives notification of a new message in / customEvents

  3. Subject to the processing order forms new message / discoveryEvents

  4. nodes topology, received notice that / discoveryEvents there is a new message, a new message is treated.

It should be noted that the mechanism customEvent implemented informing nodes of the early formation of a snapshot. This implies a modification of the message sent by topology. In the current implementation of the communication are immutable, which will require improvements in the mechanism of formation of snapshots.


Risks and Assumptions

TBD

Discussion Links

TBD

Reference Links

TBD

Tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

 

  • No labels