A Geode member may be forcibly disconnected from a Geode distributed system if the member is unresponsive for a period of time, or if a network partition separates one or more members into a group that is too small to act as the distributed system. After being disconnected from a distributed system a Geode member shuts down and then automatically restarts into a "reconnecting" state, while periodically attempting to rejoin the distributed system by contacting a list of known locators. If the member succeeds in reconnecting to a known locator, the member rebuilds its view of the distributed system from existing members and receives a new distributed system ID.

Reconnect on Forced Disconnect

  This diagram shows a member (M) is being force disconnected and reconnected to distributed system (DS) M DS Force Disconnect Disconnect from DS Reconnect Recreate cache

Quorum-based Reconnect

If the member is a locator or if multicast discovery is available, then the member perform a quorum-based reconnect; it will attempt to contact a quorum of the members that were in the membership view just before it became disconnected. If a quorum of members can be contacted, then startup of the distributed system is allowed to begin. Since the reconnecting member does not know which members survived the network partition event, all members that are in a reconnecting state will keep their UDP unicast ports open and respond to ping requests. For quorum calculation, please refer to this.

  This diagram shows a locator (L) is performing a quorum-based reconnected L M N Ping via UDP Pong via UDP Ping via UDP Pong via UDP Quorum check succeedsStart location servicebefore the distributed system is available

Cache Recreation Failure During Reconnect

When a member is trying to reconnect to the distributed system, it first connects to the distributed system, then creates the cache. If there is a failure when creating the cache, it will disconnect from the distributed system, and try to reconnect to the distributed system and recreate the cache.

  This diagram shows a member (M) is being force disconnected and reconnected to distributed system (DS) M DS Force Disconnect Disconnect from DS Reconnect Recreating cache failedDisconnect from DS Reconnect Recreate cache

References

http://geode.docs.pivotal.io/docs/managing/autoreconnect/member-reconnect.html

  • No labels