Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Apart from the state transfer, all communication between follower and leader is done using a single data structure:

Code Block
 
class QuorumPacket { 
    int type; // Request, Ack, Commit, Ping, etc
    long zxid; 
    buffer data; 
    vector<org.apache.zookeeper.data.Id> authinfo; // only used for requests
} 

The following structure is used the SERVERINFO packet:

...

Note, if a follower is connecting, but the leader is already established (in phase 3) the follower follows the phases, but the leader ignores any ACKs.

There are two noticeable differences between this description of phase 1 and the one found in http://research.yahoo.com/files/YL-2010-007.pdf:Image Added

  1. the leader does not sync with the most up-to-date follower
  2. the followers do not send their histories in the ACK of the NEWEPOCH message.

The leader election protocols that we use try to guarantee that the elected leader has the most up-to-date history from a quorum of processes. It also tries to guarantee that there is only one leader elected. It is possible for these guarantees not to hold due to race conditions, stale information, or perhaps even implementation error, so we make sure that the guarantees are satisfied in phase 1. In the tech report we do not assume that the elected leader has the most up-to-date history, so we must find the most up-to-date follower and sync with it. However since the elected leader should be the most up-to-date process, using the leader election protocols that we use, we just check the condition in step 7. If the leader meets the conditions of step 7 we are the most up-to-date, so we have implicitly synced with the most up-to-date process in the quorum. If it doesn't, we go back to leader election.

In this description and in our implementation we use the notion of connections which map to TCP connections. A follower will only be connected to a single leader at a time. Also, a follower only changes its history under the direction of a leader. The phase 1 described in the protocol, sends its history in the ACK to NEWEPOCH, but we do not send the history in step 5. Since we just need to verify that the leader has a super set of history of the follower, we only need to know the lastZxid in the history. The lastZxid is sent in step 2 in the protocol described here rather than step 5. However if the follower executes step 5 and sends the ACK to the leader, it means that for the whole time period between step 2 and 5 it has been connected to the leader. Since the follow can only be connected to one leader at a time, and since it has been connect to the same leader from step 2 to 5, and since that leader has not asked the follower to change its history, then the lastZxid sent in step 2 is still the same lastZxid in step 5, so there is no reason to resend it.

Another difference from the protocol described in the tech report is the extra information sent in SERVERINFO and the SERVERINFO sent by the leader in step 3. The extra information is for implementation purposes: making sure that the protocol version is compatible between client and server, and information for debugging purposes.

Phase 2: Sync with followers

...