ID

IEP-91

Author

Sponsor

Created

24 May 2022

Status

title	DRAFT

Table of Contents

If I have seen further it is by standing on ye shoulders of Giants

Isaac Newton

Motivation

One of the major features of AI3, as a distributed database, is the ability to execute multiple table operations as single atomic operation, known as transaction. We need to design modern and robust distributed transaction protocol, taking into account current best practices. Both key-value and SQL database access methods will rely upon it. Comparing to AI2, we aim to support transactional SQL from the beginning and remove limitations like size of transaction.

Definitions

In this section I'll give some definitions encountered though the text, for easier understanding.

...

Cascading abort - a situation in which the abort of one transaction causes the abort of another dependent transaction to avoid inconsistency.

Design Goals

To define key points of the protocol design, let's look at some features, which can be provided by the product, and value them from 1 to 3, where 3 means maximum importance for product success.

...

Let's take a look at each feature in detail and give it a value.

Strong isolation

Here we take into account the isolation property of a transaction. The strongest isolation is known to be Serializable, implying all transactions pretend to execute sequentially. This is very convenient to a user, because it prevents hidden data corruptions https://pmg.csail.mit.edu/papers/adya-phd.pdf and security issues http://www.bailis.org/papers/acidrain-sigmod2017.pdf. The price for this can be reduced throughput/latency due to increased overhead from CC protocol. Another options is to allow a user to choose a weaker isolation level, like SNAPSHOT. The ultimate goal is to implement Serializability without sacrificing performance too much, having Serializable as default isolation level. I measure it with 2

Support for interactive transactions

This is the most intuitive way to use transactions. I measure it with 3

Conflict resistance

This is a general property of a transactional protocol, defining how many transactions will be restarted in case of serialization conflict, causing a progress loss. For example, optimistic CC causes more frequent restarts under contention, because a conflict check is delayed until a commit. Avoiding cascade aborts also reduces a number of restarts. I measure it with 1

Read-only long lived transactions

Such transactions can be used to build complex OLAP reports, without affecting concurrent OLTP load. Any SQL read query is naturally mapped to this type of a transaction. Such transactions can also read snapshot data in the past, at some timestamp. Must have, I measure it with 3

Consistent replica reads

Very useful feature for load-balancing, especially in conjunction with the previous. I measure it with 3

Optimized for common scenarios

We can try to optimize the protocol to handle common scenarios better. For example, small sized transactions can be optimized by buffering writes until a commit to reduce lock held time. I measure it with 1

Geo-distribution awareness

Geo-distributed clusters are gaining popularity. While they suffer from network latency issues due to light of speed limit, they are the best for high availability. So, the protocol should minimize a number of messages send between regions. I measure it with 2

Unlimited or very large transaction size

Some databases limit the number and total size of records enlisted in a transaction. This is not convenient for a user. I measure it with 3

Transactional DDL

Nice to have, can help with migration scenarios. I measure it with 1

Data loss toleration

It's important to know how many node failures we can tolerate until declaring the unavailability due to temporary data loss (or full in case of in-memory deployment). More is better. I measure it with 2

High level interpretation

Looking at the evaluation, it's easy to notice what our freshly-baked protocol design favors usability over performance. It doesn't mean we don't need performance - we just need the acceptable level of performance, and, more importantly, scalability. Optimizations can be postponed until later.

...

We aim to reuse common replication infrastructure. This means data records will be durably replicated using a consensus based protocol, like RAFT. This approach tolerates f failed nodes from n total nodes, where n >= 2f + 1. Other products can do better, for example FoundationDB tolerates f failed nodes from n, where n >= f + 1 (but the consensus is still required). A CC protocol is not tied to the underliying replication protocol. We can change the replication protocol in the future, if we want.

Serializability

Before continuing further towards the discussion of CC protocol, which provides serializable schedules, let's dive into the serialization theory.

...

This theorethical insight will come in handy then reasoning about CC protocol.

CC protocol WIP

A CC protocol ensures that only schedules with desirable properties are generated.

...

Comparison to known protocols.

Description

// Provide the design of the solution.

Consistency model

// Describe the model

Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

Choosing MV2PL seems a well-rounded solution, but there are some risks... range lock may be too prohibitive

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

// Links to various reference documents, if applicable.

Tickets

// Links or report with relevant JIRA tickets.

Page tree

Versions Compared

Old Version 58

New Version 59

Key

Motivation

Definitions

Design Goals

Strong isolation

Support for interactive transactions

Conflict resistance

Read-only long lived transactions

Consistent replica reads

Optimized for common scenarios

Geo-distribution awareness

Unlimited or very large transaction size

Transactional DDL

Data loss toleration

High level interpretation

Serializability

CC protocol WIP

Description

Consistency model

Risks and Assumptions

Discussion Links

Reference Links

Tickets

Page tree

Page History

Versions Compared

Old Version 58

New Version 59

Key

Motivation

Definitions

Design Goals

Strong isolation

Support for interactive transactions

Conflict resistance

Read-only long lived transactions

Consistent replica reads

Optimized for common scenarios

Geo-distribution awareness

Unlimited or very large transaction size

Transactional DDL

Data loss toleration

High level interpretation

Serializability

CC protocol WIP

Description

Consistency model

Risks and Assumptions

Discussion Links

Reference Links

Tickets