ID | IEP-61 |
Author | |
Sponsor | |
Created |
|
Status | DRAFT |
Table of Contents |
---|
In Ignite 2.x there are several different mechanisms (some of them are 'nested') that share a semi-common goal:
...
Such a metastorage becomes a golden source of truth for metadata for each node in the cluster. Cluster configuration (including caches), additional cache metadata (indexes, data schema), affinity assignment, baseline topology, services assignments will be moved to the metadata storage. Discovery custom events will be eliminated as a low-level synchronization primitive in favor of ordered data updates in the metadata storage.
An approximate metastorage interface is as follows:
Code Block | ||||
---|---|---|---|---|
| ||||
public interface DistributedMetaStorage {
public Future<ReadResponse> read(Read read);
public Future<WriteResponse> write(Write write);
public Future<DeleteResponse> delete(Delete del);
public Future<UpdateResponse> update(Update update);
public WatchControl watch(Watch watch, WatchObserver observer);
}
// Read, Write, Delete, Update extend Request class
public class Update extends Request {
private List<Condition> cond;
private List<Request> onSuccess;
private List<Request> onFailure;
}
public interface WatchObserver {
public void onUpdate(Entry oldEntry, Entry newEntry);
public void onCancelled();
} |
A typical usage pattern for the distributed metastorage in pseudocode may look as follows:
Code Block | ||
---|---|---|
| ||
propVal, propVersionres = metastorage.read(new Read(key)).get(); newVal = update(...); // Update property value according to the running action updated = metastorage.putConditional(key, newVal, propVersionupdate(new Update( key, new VersionCondition(EQUAL, res.version()), new Write(key, newVal) )).get(); if (!updated.succeeded()) { // Handle a concurrent update to the same property. } |
...
Code Block | ||
---|---|---|
| ||
propVal, propVersion = currentState(key); // Get the latest property value the the local node has seen/processed. metastorage.watch(new Watch(key, propVersion), (keyoldEntry, value, versionnewEntry) -> { // Process updates propagated to the metastorage. Executed in the same order on all nodes. }); |
...
As a fundamental building block for distributed metastorage, an implementation of the Raft consensus protocol will be used [1]. The protocol is well-studied and has a large number of implementations, we can use one of them as a library, adopt the code of existing implementation for Ignite, or write a custom one.
...
Assuming we have a HA split-brain resistant distributed metastorage, we can implement other replication protocols with different availability and performance properties, such as PacificA [2]. Both protocols provide the same distributed log abstraction for commands replication. The table below summarizes the difference between the two protocols:
Raft | PacificA | |
---|---|---|
Availability conditions | A majority of the group must be online to make progress | Can make progress even with one member left |
Dependencies | Independent | Requires external consistent configuration manager |
Latency | Can acknowledge an operation after a majority of the group responded | Must await responses from either all group members or wait for failure detection timeout and reconfigure the group before acknowledging an operation |
Other | Relies on clock timeouts to ensure linearizability of operations |
...
An instance of replication protocol can be used to further implement various data structures and synchronization primitives that are currently placed in a separate system cache.