...
A choice that has to be made is what kind of operations to support – incremental changes like “add server A” or “remove server B” (e.g., as in (3)(5)) survey on reconfiguration of R/W objects #DynaStore) or full membership specification, as in “reconfigure so that the membership becomes is A, B, C” (e.g., (1)(4) survey on reconfiguration with virtual synchrony #Rambo).
One notable disadvantage of non-incremental requests is when multiple reconfigurations are proposed concurrently. Suppose that the initial configuration is A, B, C and one process proposes to add a server D whereas another proposes to remove B. If each process has to specify the full new configuration then the first process would propose A, B, C, D whereas the second would propose A, C. One of these would succeed first, suppose A, C. Then the second proposal should be aborted otherwise the resulting configuration would be A, B, C, D, again containing B.
...
The idea of an “off-line” strategy for reconfiguration (survey on reconfiguration with virtual synchrony , reconfigurable survey on reconfiguring state-machine replication ) is to stop operations in the old configuration, transfer the state to the new configuration and then enable operations – in the new configuration. In contrast, an online reconfiguration approach (#RAMBO, #DynaStore) never stops the service while reconfiguring.
One of the complexities arising in the online approach is that a normal operation can be executing concurrently with a reconfiguration, however the state still must be transferred correctly to the next configuration. The easy case is when the operation occurs in the old configuration and the reconfiguration transfers the state. It is possible, however, that the reconfiguration misses the operation when it transfers the state and completes. In this case, existing online reconfiguration solutions (#RAMBO, #DynaStore) continue the operation and execute it in the new configuration.
Unfortunately this may violate the global primary order in Zookeeper - operations issued in the new configuration (potentially by a different primary) may have already completed, in which case global primary order does not allow operations issued by an old primary to be applied.
We therefore choose the offline reconfiguration strategy, however we try to minimize the period of unavailability by pre-transferring the state to the new configuration before the reconfig begins.
...