Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

One of the major features of AI3, as a distributed database, is the ability to execute multiple table operations as single atomic operation, known as transaction. We need to design modern and robust distributed transaction protocol, taking into account current best practices. Both key-value and SQL database access methods will rely upon it. Comparing to AI2, we aim to support transactional SQL from the beginning and remove limitations like size of transaction. Our transactions may span several nodes in the cluster, making them distributed.

Definitions

In this section I'll give some definitions encountered though the text, for easier understanding.

...

  1. Strong transaction isolation
  2. Avoid cascading ascading aborts avoidance
  3. Support for interactive transactions
  4. Avoid tx restarts
  5. Long lived lightweight read-only transactions
  6. Consistent replica reads
  7. Optimized for fast path execution
  8. Geo-distribution friendly when replicas are in different regions
  9. Unlimited or very large transaction size
  10. Transactional DDL
  11. How many node failures we can tolerate without data loss

...

Here we take into account the isolation property of a transaction. The strongest isolation is known to be Serializable, implying all transactions pretend to execute sequentially. This is very convenient to a user, because can prevent it prevents hidden data corruptions https://pmg.csail.mit.edu/papers/adya-phd.pdf and avoid security issues TBD link to paper http://www.bailis.org/papers/acidrain-sigmod2017.pdf. The price for this can be reduced throughput/latency due to increased overhead from CC protocol. Another options is to allow a user to choose multiple isolation levelsa weaker isolation level, like SNAPSHOT. The ultimate goal is to implement Serializability without sacrificing performance too much, having Serializable as default isolation level. I measure it with 2.

...

Cascading aborts

...

avoidance

This is a useful thing to have, reducing the number of transaction restarts. I measure it with - 1

Support for interactive transactions

- 3

Avoid tx restarts (serialization conflicts, unstable topology)

- 1

Long lived lightweight read-only transactions (enough to build some complex report - several minutes duration maybe - good for OLAP cases - guaranteed to commit on a stable topology)

This is the most intuitive way to use transactions. I measure it with 3

Restart avoidance

This is a general property of a transactional protocol, defining how many transactions will be restarted, causing a work loss, in case of serialization conflict. For example, optimistic CC causes more frequent restarts, because a conflict check is delayed until commit. I measure it with 1

Read-only long lived transactions

Such transactions can be used to build complex OLAP reports, without affecting concurrent OLTP load. Any SQL read query is naturally mapped to this type of a transaction. Very useful feature, I measure it with - 3

Consistent replica reads

Very useful feature for load-balancing. I measure it with 3

Optimized for fast path execution (short transactions, low contention, whatever ?)

...

How many node failures we can tolerate without data loss

- 1

There are two main things - CC and atomic commitment.

High level overview

Description

// Provide the design of the solution.

...