Page History

...

upon receipt of <activate-config, M’, leader(M’)> in configuration M’
- start serving operations in M’ the first operation should clean up any incomplete reconfig request from M’forward the message to members(M')
- send <activate-config-ack> to leader(M) and to leader(M')
- A non-leader server can start processing client operations in M’

upon receipt of <activate-config, M’, leader(M’)> from leader(M) and <activate-config-ack> from quorum of members(M'), leader(M') does the following:
- commit all operations sent by leader(M) to members(M') during phase-1.
- start processing client operations in M’

upon receipt of <retire, version(M)>
- garbage-collect M

...

Instead of line 1, try to connect to M’ and transfer state, so that a quorum of M’ is connected and up-to-date. If unsuccessful, skip the reconfiguration.
If a quorum of M’ indicate that they’ve already activated M’ then skip to phase 3
Otherwise, if next(M) = M’ was returned by a quorum of M, skip to phase 2

The server s that submitted the reconfig operation to the leader should check what's the current configuration after a new leader is established. If it is still the old one it can re-submit the reconfig. If s fails, the client can re-submit the reconfig to another server. Recall that version(M) is included by the client in the request so a server receiving it should first check that the configuration is still M (using local state) before submitting it to the leader, and the leader should also make this check (since it has the latest state).

4.3. Reconfiguration API

A choice that has to be made is what kind of operations to support – incremental changes like “add server A” or “remove server B” (e.g., as in survey on reconfiguration of R/W objects , #DynaStore) or full membership specification, as in “reconfigure so that the membership becomes is A, B, C” (e.g., survey on reconfiguration with virtual synchrony , #Rambo).

...

The idea of an “off-line” strategy for reconfiguration (survey on reconfiguration with with virtual synchrony , survey on reconfiguring state-machine replication ) is to stop operations in the old configuration, transfer the state to the new configuration and then enable operations – in the new configuration. In contrast, an online reconfiguration approach (#RAMBO, #DynaStore) never stops the service while reconfiguring.
One of the complexities arising in the online approach is that a normal operation can be executing concurrently with a reconfiguration, however the state still must be transferred correctly to the next configuration. The easy case is when the operation occurs in the old configuration and the reconfiguration transfers the state. It is possible, however, that the reconfiguration misses the operation when it transfers the state and completes. In this case, existing online reconfiguration solutions (#RAMBO, #DynaStore) continue the operation and execute it in the new configuration.
Unfortunately this may In Zookeeper, we must be careful not to violate the global primary order in Zookeeper - operations issued in the new configuration (potentially by a different primary) may have already completed, in which case global primary order does not allow operations issued by an old primary to be applied.
We therefore choose the offline reconfiguration strategy, however we try to minimize the period of unavailability by pre-transferring the state to the new configuration before the reconfig begins.property - a situation where a new primary commits operations and only then the old primary's commit arrives is not allowed in ZAB. The algorithm avoids that by having leader(M') commit operations proposed by leader(M) during the reconfiguration,
and it does so before committing any other operations that it itself proposed in M'.

4.7. Other issues

4.7.1. Bootstrapping the cluster

We should be able to bootstrap ZOOKEEPER using reconfig API instead of configuration files as it is currently done.

4.7.2. Formal Liveness Specification

Should be similar to the one in survey on reconfiguration with with virtual synchrony and #DynaStore.

4.7.3. Informing clients and servers about new configuration

We should probably have a DNS-like solution as a fall-back for clients that were not around during the reconfiguration (suppose that all of the servers
that the client knew are already down). For normal operation we should also have a push-base solution to notify clients about the configuration change.
Some preliminary thoughts:

It is preferable to move clients from the old configuration to the new one gradually so that we don't need to set up hundreds of new connections simultaneously.
one possibility could be transferring client list as part of state to M' and having leader(M') inform them
Its probably good to include the version of the current config in all client operation requests. This way if the server knows about a later config (which is active) it can let the client know.

For servers in M', should the server periodically broadcast the new configuration (or its id), to try to makes sure that all servers in M' know about M' ?

4.6. Bibliography

Surveys:
1.

Anchor

	VSSurvey
	VSSurvey

Ken Birman, Dahlia Malkhi, and Robbert Van Renesse, Virtually Synchronous Methodology for Dynamic Service Replication, no. MSR-TR-2010-151, November 2010 paper
2.

Anchor

	SMRSurvey
	SMRSurvey

Leslie Lamport, Dahlia Malkhi and Lidong Zhou, Reconfiguring a State Machine. In SIGACT News 41(1), SIGACT News 41(1): 63-73 (2010) paper
3.

Anchor

	ASSurvey
	ASSurvey

Marcos K. Aguilera, Idit Keidar, Dahlia Malkhi, Jean-Philippe Martin, Alexander Shraer:
Reconfiguring Replicated Atomic Storage: A Tutorial. In the Bulletin of the European Association for Theoretical Computer Science 102, pages 84-108, Distributed Computing Column, October 2010. paper

...

Space shortcuts

Child pages

Versions Compared

Old Version 37

New Version Current

Key

4.3. Reconfiguration API

4.7. Other issues

4.7.1. Bootstrapping the cluster

4.7.2. Formal Liveness Specification

4.7.3. Informing clients and servers about new configuration

4.6. Bibliography