Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here we take into account the isolation property of a transaction. The strongest isolation is known to be Serializable, implying all transactions pretend to execute sequentially. This is very convenient to a user, because it prevents hidden data corruptions https://pmg.csail.mit.edu/papers/adya-phd.pdf and related security issues http://www.bailis.org/papers/acidrain-sigmod2017.pdf. The price for this can be reduced throughput/latency due to increased overhead from CC protocol. Another options is to allow a user to choose a weaker isolation level, like SNAPSHOT. The ultimate goal is to implement Serializability without sacrificing performance too much, having Serializable as default isolation level. I measure it with 2

...

Such transactions can be used to build complex OLAP reports, without affecting concurrent OLTP load. Any SQL read query is naturally mapped to this type of a transaction. Such transactions can also read snapshot data in the past, at some timestamp. Must have, I measure it with 3

...

Looking at the evaluation, it's easy to notice what a our freshly-baked protocol design favors usability over performance. It doesn't mean we don't need performance - we just need the acceptable level of performance, and, more importantly, scalability. Optimizations can be postponed until later.

Let’s define key points of a design. It’s necessary to have have:

  1. Interactive transactions 
  2. Long running report-like read-only queries, which are able to execute on backupsreplicas.
  3. Serializable isolation
  4. Optimized latency for geo-distributed setups, when replicas are spread between geographic regions for high availability(HA)
  5. Unlimited (or very large) txn size

...

The third requirement implies a CC protocol (an extension to actually a kind of MVCC) which allows for serialized schedules. We will look for alternatives later.

The fourth requirement means we need to reduce to a minimum a number of inter data center communications during transaction execution.

...

The system also have to be horizontally scalable. To achieve scalability, the data will be partitioned using hash or range partitioning method.  The exact partitioning method is not important for the purpose of this document. We treat a data partition here as a synonym for a data shard. Each partition is assigned to a cluster node and replicated to a predefined number of replicas to achieve high availability. Adding more nodes increases a number of partitions in the cluster (or reduces a number of partitions per node in case of static partitioning), thus increasing the scalability. A distributed transaction protocol makes cross-partition transactions preserve ACID guaranties.

Note that a correct data partitioning is a key factor for a cluster efficiency and scalability. . This is a topic for another IEP and will not be covered here.

A transaction can span multiple partitions, making it distributed. Providing atomicity on a commit is an additional difficulty in distributed environment. Typically this is achieved by using two-phase commit protocol or it's improved consensus based version https://www.microsoft.com/en-us/research/uploads/prod/2004/01/twophase-revised.pdf.

Turns out we want a Google Spanner clone. It seems it was designed keeping the similar goals in mind. Other known clones are cockroachdbCockroachdb, yugabyteYugabyte.

We aim to reuse common replication infrastructure. This means partition data records will be durably replicated using a consensus based protocol, like RAFT. This approach tolerates f failed nodes from n total nodes, where n >= 2f + 1. Other products can do better, for example FoundationDB tolerates f failed nodes from n, where n >= f + 1 (but the consensus is still required). A CC protocol is not tied to the underliying replication protocol. We can change the replication protocol in the future, if we want.

Description

// Provide the design of the solution.

Consistency model

// Describe the model

Risks and Assumptions

...