Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

GMSHealthMonitor makes sure that each member in the distributed system is alive and communicating to this member. To make sure that we create the ring of members based on current view. On this ring, each member make makes sure that the next member in the ring (its neighbor) is communicating with it. For that we record last message timestamp from its neighbor. And if it sees its neighbor has not communicated in last period(member-timeout) then we check whether its neighbor is still alive or not. Based on that we informed probable coordinators to remove its neighbor from the view, if its neighbor is not alive.

HeartbeatMessage

Each Member periodically sends HeartbeatMesage (UDP) to all the other members in the distributed system, including the coordinator. Upon receiving the HearbeatMessage, the receiving member updates its record of receiving timestamp associated with the sender. The receiver does not reply to such HeartbeatMessages with requestID equals -1.

 

PlantUML
title This diagram shows a member (M) sending HeartbeatMessage to all the other members (N, C) in the distributed system (N, C)
hide footbox
entity M
entity N
entity C
M -> N: HeartbeatMessage(-1)
note right : update the record of timestamp
M -> C : HeartbeatMessage(-1)
note right : update the record of timestamp

...

 

PlantUML
title This diagram shows a member (M) notifies Coordinator (C) of Suspect Member (S) and the Final Check Process
hide footbox
entity M
entity S
entity C
M -> S : HeartbeatRequestMessage(requestID)
note right : via UDP
note right of M
No response from S
after timeout
end note
M -> C : SuspectMembersMessage
note right : via UDP
note right of C : start final check of suspect member
C -> S : HeartbeatRequestMessage(requestID)
note right : via UDP
S --> C : HeartbeatMessage(requestID)
note right : via UDP
C -> S : (Version, ViewID, UUID)
note right : via TCP
S --> C : OK
note right : via TCP
 

...

 

PlantUML
title This diagram shows a member (M) notifies Coordinator (C) of Suspect Member (S) and the Failed Final Check Process
hide footbox
entity M
entity S
entity C
M -> S : HeartbeatRequestMessage(requestID)
note right : via UDP
note right of M
no response from S
after timeout
end note
M -> C : SuspectMembersMessage
note right : via UDP
note right of C : start final check of suspect member
C -> S : HeartbeatRequestMessage(requestID)
note right : via UDP
C -> S : (Version, ViewID, UUID)
note right : via TCP
note right of C
no response from S after timeout
ask GMSJoinLeave to remove S
end note
 

...