Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

PlantUML
title
Details of concurrent startup of two locators when the 
locators are preferred as membership coordinators.  This
diagram focuses on the first locator, L1
end title
entity C
entity L1
entity L2

note right of L2
L1 and L2 have been killed.  C
detects loss and becomes coordinator.
L1 and L2 are somehow restarted
simultaneously.  This diagram tracks
L1's restart activity
end note

L1 -> L1 : recoverFromFile
note right
on startup locators recover their
last membership view from .dat file
and from other locators
end note

L1 -> L2 : recoverFromOthers
L2 -> L1 : old view
note right of L1
L1 will try to join with the old
coordinator and then fall into
findCoordinatorFromView
end note

L1 -> C : FindCoordinator()
C -> L1 : response(coord=C)

L1 -> C : JoinRequest
C -> L1 : New View(coord=C)
L1 -> L1 : continue startup
L1 -> C : New View(coord=L1)
note right
Upon receiving the new view with coord=C
L1 will determine that it should become
coordinator and create a new view
end note


 

...

In the initial implementation of GMSJoinLeave the current coordinator recognized that a locator was attempting to join and responded with a "become coordinator" message.  This lead to a lot of complications when a second locator was also trying to join so we decided to remove the whole "become coordinator" notion and have the current coordinator accept and process the JoinRequest.  This allows the locator to join and then detect that it should be the coordinator.

 

PlantUML
title
Details of concurrent startup of two locators when the
locators are preferred as membership coordinators.  This
diagram focuses on the second locator, L2
end title

entity L1 #grey
entity C
entity L2

note right of L2
L1 and L2 have been killed.  C
detects loss and becomes coordinator.
L1 and L2 are restarted simultaneously
end note

L1 -> C : JoinRequest
C -> L1 : new view(coord=C,L1)

L2 -> L1 : recover
L1 -> L2 : old view + L1
note right of L2
L2 will try to join with L1 and then
fall into findCoordinatorFromView
end note

L2 -> L1 : JoinRequest
L1 -> L1 : queues JoinRequest from L2

L2 -> C : FindCoordinator
note right: (sent via UDP)
C -> L2 : response(coord=C)
note right
At this time C is still coordinator and
will tell L2, who will try to join with it
end note

L1 -> C : new view(coord=L1,C)

L2 -> C : JoinRequest
C -> L2 : JoinResponse(coord=L1)
note right
C has received the deposing view from L1
and will respond to L2's JoinRequest with
a response telling it that L1 is now coordinator
end note

L2 -> L2 : waits for response to initial JoinRequest 
L1 -> L2 : new view(coord=L1,C,L2)
L2 -> L2 : continues startup

 

Here we see L2 starting up and attempting to join while L1 is in the process of joining and deposing C as the coordinator.

L2 contacts L1 to find the coordinator and sees that it should become the coordinator.  L2 attempts to join by sending a JoinRequest to L1 but it is not yet coordinator so it merely queues the request and continues in its own attempt to join.

L2 gives up waiting for a response from L1 and, having received a view from the GMSLocators it has contacted, attempts to join using coordinators selected from that view.  Eventually it attempts to join using C as the coordinator.

By the time C receives L2's JoinRequest it has been deposed as coordinator.  In response to the request it sends L2 a JoinResponse telling it that L1 is now coordinator.  Since L2 has already sent a JoinRequest to L2 it now knows that it must be patient and wait for a new view admitting it into the distributed system.