Geode members form a ring topology for purposes of health monitoring/failure detection.

Here's what system logs might show after a 10-member cluster was running for a while.

At row 6 we see a failing member that's eventually removed from the view. Row 5 is the member that was monitoring that one. The coordinator is at row 3.

As expected we see a typical member sending about 1.5 times as many heartbeat messages as it receives. That's because a member sends heartbeat messages to the two members to it's "left" (or "counter-clockwise") in the view, and also sends heartbeat messages to the coordinator.

Also as expected we see the coordinator receiving about 1/3 of all messages sent by all other members, since ever other member sends it heartbeats. We also see it send about 2/3 as many heartbeat messages as the typical member, since it's the coordinator—it doesn't have another coordinator to send heartbeats to.

Due to details of the heartbeat-generation algorithm (loop) in (see the GMSHealthMonitor.Heart class) we see the two members to the "left" of the coordinator receiving an extra portion of the heartbeat messages.

  • No labels