The class GMSJoinLeave is responsible for knowing who is in the distributed system. It interacts with Locators and Membership Coordinators and makes decisions about who should fill the role of Coordinator. It is also responsible for detecting network partitions.
The diagrams in this document show some of the interactions that take place when new members join and how network partitions are detected.
Table of Contents |
---|
Joining an existing distributed system
PlantUML |
---|
title This diagram shows a member (N) using a locator (L) discovering the Coordinator (C) and joining hide footbox entity N entity L entity C N -> L : FindCoordinator(myId) note right : via tcp/ip L --> N : FindCoordinatorResponse(c) N -> C : JoinRequest note right : this and subsequent communications via UDP note right of C : ViewCreator thread processes request C -> N : PrepareView(c,l,n) N --> C : ack C -> L : PrepareView(c,l,n) L --> C : ack C -> N : InstallView(c,l,n) N --> C : ack note right of N continue startup after sending ack end note C -> L : InstallView(c,l,n) L --> C : ack |
The above diagram shows interaction with a Locator and the membership Coordinator during startup. The joining member first connects to all locators that it knows about and asks them for the ID of the current membership coordinator. Locators know about this because they, themselves, are part of the distributed system and receive all membership views. In the current implementation the method GMSJoinLeave.findCoordinator() is used to do this in the new member. The locators also tell the new member which membership view they are aware of so that the new member can choose the most recent, should the coordinator be in the process of establishing a new view. This is especially important if a new coordinator is in the process of taking control.
Once the coordinator's ID is known the new member sends a JoinRequest to it. The new member will know that it has been accepted when it receives a membership view containing its ID. At this point it sets the view-ID of its identifiers. The view-ID, in combination with its IP address and membership port (the UDP port used by JGroups) uniquely identifies this member.
PlantUML |
---|
title state transitions for joining member [*] -> Discovery Discovery: search for coordinator Discovery: using Locator service Discovery -> Joining Joining: join req sent to coordinator Joining -> Member Member: unique ID established Member: membership view installed Member -> [*] |
Simultaneous joins and leaves
PlantUML |
---|
title This shows a member (M) leaving and another member (P) joining in the same view hide footbox entity M entity C entity P M -> C : LeaveRequest C --> M : LeaveResponse note right of C : LeaveRequest is queued destroy M P -> C : JoinRequest note right of C JoinRequest is queued. View Creator wakes up and creates view {C,P} removing M end note C -> P : Prepare {C,P} P --> C : ack C -> P : Install {C,P} P --> C : ack note right of P: P processes view and completes Join |
Geode can handle simultaneous membership additions and removals. Join, Leave and Remove requests are queued in a list in GMSJoinLeave and are processed by a ViewCreator thread. It is the ViewCreator thread that creates and sends new membership views.
Two locators starting with no other members
PlantUML |
---|
title This diagram shows two Locators starting up at the same time with no other members present. hide footbox entity L1 entity L2 L1 -> L2 : FindCoordinator note right : via tcp/ip L2 --> L1 : FindCoordinatorResponse(registrants={L2,L1}) note right of L2 both members get {L1,L2} registrants and choose L1 as coordinator due to ID sort order end note L2 -> L1 : FindCoordinator L1 --> L2 : FindCoordinatorResponse(registrants={L1,L2}) note right of L2 : subsequent messaging is via UDP L2 -> L1 : JoinRequest L1 -> L1 : becomeCoordinator note right of L1 : L1 ViewCreator thread processes request L1 -> L2 : PrepareView(L1,L2) L2 --> L1 : ack L1 -> L2 : InstallView(L1,L2) L2 --> L1 : ack note right of L1 : L1 and L2 continue startup |
It's best to stagger the start-up of locators but Geode can handle simultaneous startup as well. GMSLocator maintains a collection of what it calls "registrants" who have contacted it requesting the ID of the current coordinator. If there is no coordinator it will respond with the collection of registrants and the processes that are trying to join will use this information to determine who is most likely to be chosen as membership coordinator. They will then send a JoinRequest to that process in hope that it will figure out that it should become coordinator and take action on the request.
In the above diagram we see L1 make the decision to become coordinator and create an initial membership view. Since it has received a JoinRequest from L2 it includes it in this initial view.
Locators restarting and taking control
PlantUML |
---|
title Details of concurrent startup of two locators when the locators are preferred as membership coordinators. This diagram focuses on the first locator, L1 end title hide footbox entity C entity L1 entity L2 note over L2 L1 and L2 have been killed. C detects loss and becomes coordinator. L1 and L2 are somehow restarted simultaneously. This diagram tracks L1's restart activity end note L1 -> L1 : recoverFromFile note right on startup locators recover their last membership view from .dat file and from other locators end note L1 -> L2 : recoverFromOthers L2 --> L1 : old view note right of L1 L1 will try to join with the old coordinator and then fall into findCoordinatorFromView end note L1 -> C : FindCoordinator() note right: (via UDP) C --> L1 : response(coord=C) L1 -> C : JoinRequest C -> L1 : prepare/install view(coord=C) L1 -> L1 : becomeCoordinator L1 -> C : prepare/install view(coord=L1) note right Upon receiving the new view with coord=C L1 will determine that it should become coordinator and create a new view end note |
Geode prefers to have locators be the membership coordinator when network partition detection is enabled, or when peer-to-peer authentication is enabled. Other members will take on the role if there are no locators in the membership view but they will be deposed once a new locator joins the distributed system.
But what happens if two locators are started at the same time? Which one becomes the new coordinator? The diagram above shows this interaction from the perspective of one of the locators. The diagram below shows this same interaction from the perspective of the other locator.
In the initial implementation of GMSJoinLeave the current coordinator recognized that a locator was attempting to join and responded with a "become coordinator" message. This lead to a lot of complications when a second locator was also trying to join so we decided to remove the whole "become coordinator" notion and have the current coordinator accept and process the JoinRequest. This allows the locator to join and then detect that it should be the coordinator.
Locators starting and taking control - continued
PlantUML |
---|
title Details of concurrent startup of two locators when the locators are preferred as membership coordinators. This diagram focuses on the second locator, L2 end title hide footbox entity L1 entity C entity L2 note right of L2 L1 and L2 have been killed. C detects loss and becomes coordinator. L1 and L2 are restarted simultaneously end note L1 -> C : JoinRequest C -> L1 : prepare view(coord=C,L1) L2 -> L1 : recover L1 --> L2 : old view + L1 note right of L2 L2 will try to join with L1 and then fall into findCoordinatorFromView end note L2 -> L1 : JoinRequest L1 -> L1 : queues JoinRequest from L2 C -> L1 : install view(coord=C,L1) note left: L1 is a member now L2 -> C : FindCoordinator note right: (sent via UDP) C --> L2 : response(coord=C) note right At this time C is still coordinator and will tell L2, who will try to join with it end note L1 -> C : prepare/install view(coord=L1,C) L2 -> C : JoinRequest C --> L2 : JoinResponse(coord=L1) note right C has received the deposing view from L1 and will respond to L2's JoinRequest with a response telling it that L1 is now coordinator end note L2 -> L2 : waits for response to initial JoinRequest L1 -> L2 : prepare/install view(coord=L1,C,L2) L2 -> L2 : continues startup |
Here we see L2 starting up and attempting to join while L1 is in the process of joining and deposing C as the coordinator.
L2 contacts L1 to find the coordinator and sees that it should become the coordinator. L2 attempts to join by sending a JoinRequest to L1 but it is not yet coordinator so it merely queues the request and continues in its own attempt to join.
L2 gives up waiting for a response from L1 and, having received a view from the GMSLocators it has contacted, attempts to join using coordinators selected from that view. Eventually it attempts to join using C as the coordinator.
By the time C receives L2's JoinRequest it has been deposed as coordinator. In response to the request it sends L2 a JoinResponse telling it that L1 is now coordinator. Since L2 has already sent a JoinRequest to L2 it now knows that it must be patient and wait for a new view admitting it into the distributed system.
Detecting a network partition
PlantUML |
---|
title Details of handling of a network partition between L, M and N on one side of the partition and A & B on the other side. end title hide footbox entity L entity M entity N entity A entity B note over L The initial membership view is (L,A,B,M,N) end note M -> L : suspect(A) note right L receives a suspect message and asks the health monitor to check on the member. The check fails, initiating a removal. end note L ->x A : attempt to contact L -> M : prepare view (L,B,M,N) L -> N : prepare view (L,B,M,N) L ->x B : prepare view (L,M,N,B) note right this message does not reach B due to the network partition end note M --> L : ack N --> L : ack L -> L : wait for ack from\nB times out L ->x B : attempt to contact L -> L : quorum calculation on (L,M,N) fails! L -> M : forceDisconnect! L -> N : forceDisconnect! ||| newpage losing side of the partition B -> A : suspect(L) note left of A A & B start suspecting members on the other side of the split. When L is deemed dead A will become coordinator because it is next in line in view (L,A,B,M,N) end note A -> A : becomeCoordinator note left of A after suspect processing A decides to become coordinator and send view (A,B,M,N) end note A ->x M : prepare view(A,B,M,N) A ->x N : prepare view(A,B,M,N) A -> B : prepare view(A,B,M,N) B --> A : ack A -> A : times out waiting for acks A ->x M : attempt to contact A ->x N : attempt to contact A -> A : quorum calculation on (A,B) passes A -> B : prepare/install view (A,B) |
Here we see a network partition with three members on one side (L, M and N) and two members on the other (A and B). When this happens the HealthMonitor is responsible for initiating suspect processing on members that haven't been heard from and cannot be contacted. It eventually tells the JoinLeave component to remove one of the members. All members of the distributed system perform this kind of processing and it can lead to a member deciding that the current Coordinator is no longer there and electing itself as the new Coordinator.
In the above diagram we see this happen on the A,B side of the partition. B initiates suspect processing on the current Coordinator (process L) and it notifies A. A performs a health check on L and decides it is unreachable, electing itself coordinator. It then sends a prepare-view message with (A,B,M,N) and expects acknowledgement but receives none from M or N. After checking on M and N via the HealthMonitor it kicks them out, creating the new view (A,B). This view change passes the quorum check so A prepares and installs it.
On the losing side of the partition L is notified that A is suspect and kicks it out, creating view (L,B,M,N). It prepares the view but gets no response from B. It kicks B out, forming view (L,M,N). This view has lost quorum so instead of sending it out it notifies M and N that they should shut down and then shuts itself down as well.
Let's look at the quorum calculations themselves: If L is a locator and the others are normal members the initial membership view, (L,A,B,M,N) would have weights (3,15,10,10,10) . Locators only have 3 points of weight, regular members get 10 and the so-called Lead Member gets an additional 5 points of weight. View (A,B) represents a loss of 23 of those points while view (L,M,N) represents a loss of 25 points which is more than 50% of the total weight of the initial view. This why A and B remained standing while L, M and N shut down even though there were more processes on their side of the network split.