Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
IDIEP-3
Author

Semen Boikov

 

Alexey Goncharuk

 

Sergey Puchnin

SponsorVladimir Ozerov Ozerov
Created22 Sep 2017
Status
Status
colourGrey
titleDRAFT

...

TX SQL will be implemented on top of existing snapshot-based MVCC infrastructure. Writes obtain locks on keys. Reads do not obtain locks. Writes do not block reads. Read Reads can be converted to blocking mode using SELECT ... FOR UPDATE statement.

...

  • Every operation first acquire global sequence number on coordinator. Both SELECT and update operations use sequence number to filter more recent updates. This way operation see only rows which existed by the time operation began.
  • Update operation then acquires locks on target rows one by one. Row might have been already locked by concurrent transaction at this point. If concurrent transaction is rolled back or and lock is acquired on expected version no additional actions are required. If concurrent transaction modifies the row and commits, current transaction acquires the lock and re-evaluates original condition. If condition evaluates to true still, then lock is retained. Otherwise lock is released and row is ignored.
  • Subsequent SELECTs see previous updates
  • On TX commit client requests another sequence number which is applied to all modified rows. 

All DML requests are split into two groups: with and without reduce step. If reduce step is not needed, locks are obtained on map nodes immediately. If reduce step is needed (e.g. non-collocated aggregation), then we cannot lock rows on mapper immediately, because we do not know target row set in advance. In this case filter condition should be re-evaluated as well by executing distributed query again (TBD),

Transactional protocol

Typical DML operation may modify any number of rows. it means we cannot store all modified rows on a near node. Current TX protocol must be extended, so that updates are stored on primary/backup nodes only and not transferred to near node.

...