Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Let's take a look at each feature in detail and give it a value.

Strong

...

isolation

Here we take into account the isolation property of a transaction. The strongest isolation is known to be Serializable, implying all transactions pretend to execute sequentially. This is very convenient to a user, because it prevents hidden data corruptions https://pmg.csail.mit.edu/papers/adya-phd.pdf and security issues http://www.bailis.org/papers/acidrain-sigmod2017.pdf. The price for this can be reduced throughput/latency due to increased overhead from CC protocol. Another options is to allow a user to choose a weaker isolation level, like SNAPSHOT. The ultimate goal is to implement Serializability without sacrificing performance too much, having Serializable as default isolation level. I measure it with 2

...

The fifth requirement implies not buffering a whole transaction on the coordinator nodein memory.

The system also have to be horizontally scalable. To achieve scalability, the data will be partitioned using a hash or range partitioning method.  The exact partitioning method is not important for the purpose of this document. We treat a partition here as a synonym for a shard. Each partition is assigned to a cluster node and replicated to a predefined number of replicas to achieve high availability. Adding more nodes increases a number of partitions in the cluster (or reduces a number of partitions per node in case of static partitioning), thus increasing the scalability.

...

Before continuing further towards the discussion of CC protocol, which provides serializable schedules, let's dive into the serialization theory.

Example 1. Assume we have 3 transactions:

...

S2 = w3[x] r1[x] r3[y] r2[y] w3[z] r2[z] r1[z] w2[y] w1[x]

We assume a commit of each transaction. What we can tell about serializability of S2 ? Recall the serializable schedule definition: to be serializable, it must be equivalent to some serial execution order of transactions T1, T2, T3.

...

The S1 is obviously serial: it corresponds to execution sequence: T3, T2, T1. It's not that obvious for S2 if it's serializable or not. To prove it me should find a equivalent serializable schedule. We can attempt to swap non-conflicting operation (preseving the order of conflicting) until the equivalent schedule is produced.

w3[x] r1[x] r3[y] r2[y] w3[z] r2[z] r1[z] w2[y] w1[x] → w3[x] r3[y] r1[x] r2[y] w3[z] r2[z] r1[z] w2[y] w1[x] → w3[x] r3[y]r2[y]r1[x] w3[z] r2[z] r1[z] w2[y] w1[x] w2[y] → w3[x] r3[y] r2[y] w3[z] r1[x] r2[z] r1[z] w2[y] w1[x] → w3[x] r3[y]w3[z] r2[y] r1[x] r2[z] w1 r1[xz] w2[y] w1[x]  → w3[x] r3[y]w3[z] r2[y] r2[z] r1[x] r1[z] w2[y] w1[x] w2[y]  → w3[x] r3[y]w3[z] r2[y] r2[z] w2[y]   r1[x] r1[z] w1[x] 

So, S2 is serializable and equivalent to T3, T2, T1. Such schedules are called conflic serializable - they can be converted to serial schedule by swapping non-conflicting operations, which have no effect on execution outcome. Every conflict serializable schedule is serializable, but not vice versa.

Example 2. Consider the transactions:

T1 = r1[x] w1[x] c1, T2 = w2[x] c2

and a schedule: S = r1[x] w2[x] c2 w1[x] c1

This schedule S is serializable (it's equivalent to T1, T2), but not conflict serializable.

...

Intuitively, two schedules are equivalent if the executions that produced them leave the database in the same state. The formal definition of equivalence is:

Two schedules are computationally view equivalent if and only if:

  • Both schedules have the same initial reads.
  • Each read operation reads from the same write operation in both schedules , and
  • Both schedules have the same final writes.

...

A schedule is view serializable if it is computationally view equivalent to some serial schedule. Every conflict serializable schedule is view serializable, although the converse is not true.

...

Enforcing or testing view serializability turns out to be much more expensive, and the concept therefore has little practical use.

TBDExample 3. Consider the transaction:

T1=r1[x] w1[x], T2=r2[x] w2[x]

and a schedule: S = r2[x] r1[x] w2[x] w1[x]

S  is not serializable schedule.

Until now we have talked about schedules having only committed transactions. We must also take into consideration schedules containing aborted transactions. Consider  

Example 4. Consider two transactions

T1 = w1[x] a1, T2 = r2[x] w2[x] c2

...

Such schedule is unrecoverable, because T2 is committed and can't be undone on T1 abort.

Example 5. Another example:

T1 = w1[x] c1, T2 = w2[x] a2

...

For a schedule to be recoverable, it's necessary to read only committed data. Recoverable schedules also avoid cascading aborts.

TBD strict schedules

CC protocols

MV2PL (S2PL vs non-S2PL)

MVOCC (SNAPSHOT)

MVTO

SGT

Description

// Provide the design of the solution.

...