Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


IDIEP-61
Author
Sponsor
Created

  

Status
DRAFT

Table of Contents

Motivation

In Ignite 2.x there are several different mechanisms (some of them are 'nested') that share a semi-common goal:

...

Such a metastorage becomes a golden source of truth for metadata for each node in the cluster. Cluster configuration (including caches), additional cache metadata (indexes, data schema), affinity assignment, baseline topology, services assignments will be moved to the metadata storage. Discovery custom events will be eliminated as a low-level synchronization primitive in favor of ordered data updates in the metadata storage.

Metastorage Interface

An approximate metastorage interface is as follows:

Code Block
languagejava
titleMetastorage
public interface DistributedMetaStorage {
    public Future<ReadResponse> read(Read read);

    public Future<WriteResponse> write(Write write);
    public Future<DeleteResponse> delete(Delete del);
    public Future<UpdateResponse> update(Update update);

    public WatchControl watch(Watch watch, WatchObserver observer);
}

// Read, Write, Delete, Update extend Request class

public class Update extends Request {
    private List<Condition> cond;
    private List<Request> onSuccess;
    private List<Request> onFailure;
}

public interface WatchObserver {
    public void onUpdate(Entry oldEntry, Entry newEntry);
    public void onCancelled();
}


A typical usage pattern for the distributed metastorage in pseudocode may look as follows:

Code Block
titleUpdating entry in metastorage
propVal, propVersionres = metastorage.read(new Read(key)).get();

newVal = update(...); // Update property value according to the running action

updated = metastorage.putConditional(key, newVal, propVersionupdate(new Update(
	key,
    new VersionCondition(EQUAL, res.version()),
    new Write(key, newVal)
)).get();

if (!updated.succeeded()) {
    // Handle a concurrent update to the same property.
}

...

Code Block
titleFollowing metastorage updates
propVal, propVersion = currentState(key); // Get the latest property value the the local node has seen/processed.

metastorage.watch(new Watch(key, propVersion), (keyoldEntry, value, versionnewEntry) -> {
    // Process updates propagated to the metastorage. Executed in the same order on all nodes.
});

...

As a fundamental building block for distributed metastorage, an implementation of the Raft consensus protocol will be used [1]. The protocol is well-studied and has a large number of implementations, we can use one of them as a library, adopt the code of existing implementation for Ignite, or write a custom one. 

...

Assuming we have a HA split-brain resistant distributed metastorage, we can implement other replication protocols with different availability and performance properties, such as PacificA [2]. Both protocols provide the same distributed log abstraction for commands replication. The table below summarizes the difference between the two protocols:


RaftPacificA
Availability conditionsA majority of the group must be online to make progressCan make progress even with one member left
DependenciesIndependentRequires external consistent configuration manager
LatencyCan acknowledge an operation after a majority of the group respondedMust await responses from either all group members or wait for failure detection timeout and reconfigure the group before acknowledging an operation
Other
Relies on clock timeouts to ensure linearizability of operations

Further Applications

Partition Replication

...

An instance of replication protocol can be used to further implement various data structures and synchronization primitives that are currently placed in a separate system cache.

Links

  1. Raft Consensus Protocol
  2. PacificA Replication Protocol