ID | IEP-91 |
Author | Alexey Scherbakov |
Sponsor | Alexey Scherbakov |
Created |
|
Status | DRAFT |
If I have seen further it is by standing on ye sholders of Giants
Isaac Newton
One of the major features of AI3, as a distributed database, is the ability to execute multiple table operations as single atomic operation, known as transaction. We need to design modern and robust distributed transaction protocol, taking into account current best practices. Both key-value and SQL database access methods will rely upon it. Comparing to AI2, we aim to support transactional SQL from the beginning and remove limitations like size of transaction.
In this section I'll give some definitions encountered though the text, for easier understanding.
Record (aka Row, Tuple, Relation) - a collection of attribute-value pairs.
Transaction - a sequence of logically related partially ordered actions (reads or writes) over the database objects.
Atomicity - a transaction property which declares: either all actions are carried out or none are.
Consistency - a property which moves a database from one consistent state to another after finish. A meaning of the consistent state is defined by a user.
Isolation - a measure of mutual influence between interleaved transactions.
Durability - a transaction property which guarantees that database state remains unchanged after a transaction is committed, despite any failures.
Schedule - a way of executing interleaved transactions.
Serial schedule - a schedule where all transactions are executed sequentially.
Serializable schedule - a schedule which is equivalent to some serial execution of interleaved transactions.
Concurrency control (CC) - a technique to preserve database consistency in case of interleaved transactions.
Multi-version concurrency control (MVCC) - a family of concurrency control techniques based on writing multiple record versions (copy-on-write).
Recoverable schedule - a schedule which is not affected by aborting some of involved transactions. A transaction reads only committed values to achieve this.
Interactive transaction - a transaction whose operation set is not known apriory. Can be aborted at any time, if not committed yet.
Cascading abort - a situation in which the abort of one transaction causes the abort of another dependent transaction to avoid inconsistency.
To define key points of the protocol design, let's look at some features, which can be provided by the product, and value them from 1 to 3, where 3 means maximum importance for product success.
Let's take a look at each feature in detail and give it a value.
Here we take into account the isolation property of a transaction. The strongest isolation is known to be Serializable, implying all transactions pretend to execute sequentially. This is very convenient to a user, because it prevents hidden data corruptions https://pmg.csail.mit.edu/papers/adya-phd.pdf and avoid security issues http://www.bailis.org/papers/acidrain-sigmod2017.pdf. The price for this can be reduced throughput/latency due to increased overhead from CC protocol. Another options is to allow a user to choose a weaker isolation level, like SNAPSHOT. The ultimate goal is to implement Serializability without sacrificing performance too much, having Serializable as default isolation level. I measure it with 2
This is a useful thing to have, reducing the number of transaction restarts. I measure it with 1
This is the most intuitive way to use transactions. I measure it with 3
This is a general property of a transactional protocol, defining how many transactions will be restarted, causing a work loss, in case of serialization conflict. For example, optimistic CC causes more frequent restarts, because a conflict check is delayed until commit. I measure it with 1
Such transactions can be used to build complex OLAP reports, without affecting concurrent OLTP load. Any SQL read query is naturally mapped to this type of a transaction. Very useful feature, I measure it with 3
Very useful feature for load-balancing. I measure it with 3
- 1
- 2
- 3
- 1
- 1
There are two main things - CC and atomic commitment.
// Provide the design of the solution.
// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.
// Links to discussions on the devlist, if applicable.
// Links to various reference documents, if applicable.
// Links or report with relevant JIRA tickets.