Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Besides the existing key-value storage, each node should introduce an additional versioned key-value storage used to filter out not-ready-to-be-read versions. Each versioned read attempt should merge results obtained from versioned store and general storage. Below is one of the possible data structures designs for SQL indexes (main PK index should be the same with the only exception that it returns a single value instead of a range cursor) .

Code Block
languagejava
titleMVCC LSM
class VersionedIndex {
    private NavigableMap<Key, List<VersionedRow>> versionedVals;
 
    public Cursor find(SearchRow lower, SearchRow upper, VersionInfo readVersion) {
        // Here we must first get a range of keys matching the search boundaries and then filter out the visible versions.
        // Note that versionedVals must explicitly contain deleted keys in order to be able to correctly merge results.
    }
}
 
class MergeIndex {
    public Cursor find(SearchRow lower, SearchRow upper, VersionInfo readVersion) {
        Cursor versioned = versionedIdx.find(lower, upper, readVersion);
 
        // Committed index does not need readVersion because it contains values that are 100% safe to read.
        Cursor committed = committedIdx.find(lower, upper);
 
        // Here we must return a merge cursor. In merge cursor keys from versioned cursor must override keys from committed cursor.
    }
}

 

 

...

When a node receives a safe-to-keep version from coordinator, it should merge latest value (wrt safe-to-keep version) from versioned index to committed index.

The size of the Versioned Index should be limited to avoid OOME. When the size of the versioned index exceeds maximum allowed value, the oldest committed value is merged to the committed index and the versioned index itself must me marked with this version value to disallow invalid reads based on the evicted version (an exception must be thrown for such read).

Transactional Protocol Changes

Commit Protocol Changes

For the first implementation only optimistic serializable transaction will be supported (semantics of sequential pessimistic locking and snapshotable isolation is not yet defined).

On transaction prepare step, when all transactional locks are acquired, the transaction coordinator must send request to the version coordinator to acquire the transaction write version. At this step acquired version is added to coordinator pending versions. 

Recovery Protocol Changes

Read (getAll and SQL changes)