Database Transaction - ACID & Isolation Level
ACID
ACID is a set of properties of database transactions intended to guarantee data validity despite concurrency, errors, power failures, and other mishaps.
A - Atomcity - 原子性
ALL-or-Nothing
Transactions are often composed of multiple statements. Atomicity guarantees that each transaction is treated as a single “unit”, which either succeeds completely or fails completely: if any of the statements constituting a transaction fails to complete, the entire transaction fails and rollback
the data to the unchanged.
C - Consistency - 一致性
The guarantee that database constraints are not violated, particularly once a transaction commits.
The guarantee that any transactions started in the future necessarily see the effects of other transactions committed in the past.
I - Isolation - 隔离性
Transactions are often executed concurrently (e.g., multiple transactions reading and writing to a table at the same time). Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. Isolation is the main goal of concurrency control(isolation level).
the effects of an incomplete transaction might not be visible to other transactions depending on the isolation level
D - Durability - 持久性
It guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure. This usually means that completed transactions (or their effects) are recorded in non-volatile memory(e.g. disk).
Isolation Level - Concurrency Control
Serializable(可序列化) - level 3
It’s the most restrictive of all isolation levels. All Transactions that may affect others are executed serially.
All transactions are protected by read-write(RW) lock with the level of range/table.
- Pros: best consistency of all committed value
- Cons: extremely bad performance, usually 30x slower than repeatable read
Repeatable Read(可重复读) - level 2 - Default level in most DBs
- Pros: consistency of row-level committed value
- Cons: range-based query may meet phantom read
Phantom Read(幻读)
Transaction A reads the same range again and will get the new record that Transaction B just inserted. The results of these two range-based readings are different. e.g. count operation.
Read Committed(读已提交) - level 1
A transaction can’t read data that is not yet committed by other transactions.
- Pros: solve dirty read
- Cons: query may hit non-repeated/Phantom read
Non-Repeated Read(不可重复读)
Read uncommitted - level 0 - 未提交读 ~ 无锁
A transaction can read the latest data modified by other transactions which may be even uncommitted (Dirty Read).
Dirty Read(脏读):
Transaction B may read the uncommitted data written by A.
Isolation Summary
Consistency Problem | Dirty Read | Non-repeatable read | Phantom read |
---|---|---|---|
SERIALIZABLE | no | no | no |
REPEATABLE_READ | no | no | yes in isolation definition No for MVCC based snapshot read |
READ_COMMITTED | no | yes | yes |
READ_UNCOMMITTED | yes | yes | yes |
Implementation
Atomicity + Durability
Common mechanisms:
Redo/Undo Logging: Write ahead log(WAL) records all operations before execution. It could redo or undo incompleted operations when failures happen.
Shadow Paging: a copy-on-write technique for avoiding in-place updates of pages. Instead, when a page is to be modified, a shadow page is allocated. Since the shadow page has no references (from other pages on disk), it can be modified liberally, without concern for consistency constraints, etc. When the page is ready to become durable, all pages that referred to the original are updated to refer to the new replacement page instead. Because the page is “activated” only when it is ready, it is atomic.
Consistency + Isolation
Legacy - LBCC - Pessimistic(悲观) Lock
Lock Based Concurrency Control
Database with legacy version applies lock-based concurrency control.
- Serializable(可序列化) - table/range RW lock
- Repeatable Read(可重复读) - row-level RW lock
- Read Committed(读已提交)
- It’s still row-level RW lock based but the read operation just checks W(X) lock but did not acquire R(X) lock, write operation acquires W(X) Lock to block other operations and releases the lock after commit.
- Read uncommitted(读未提交)
- Just control concurrency on the statement level: it’s almost the same with
Read
Committed` but release lock right after statement execution instead of commit.
- Just control concurrency on the statement level: it’s almost the same with
It usually uses shared(read) and exclusive(write) lock to achieve the read-write(RW) lock.
- A shared (S, read) lock permits the transaction that holds the lock to read a row.
- An exclusive (X, write) lock permits the transaction that holds the lock to update or delete a row.
- Pros: easy to implement and understand
- Cons: too many blockers => too much read waiting!!!!
Modern - MVCC - Version as Optimistic(乐观) Lock
Multi-Version Concurrency Control
Mysql default storage engine InnoDB applies MVCC to optimize READ_COMMITED
and REPEATED_READ
, since there is no need to check/acquire lock when reading.
It enables snapshot read, that is, the read operation gets data with a bit early version without waiting even when it’s operated by the write operation.
There is no difference on level READ_UNCOMMITTED and SEARIALIZABLE, the write lock is still maintained by sql Server which is out of InnoDB engine to avoid parallel write operations on the same record (row).
Two additional hidden columns per row
- transaction id
DB_TRX_ID
:
It indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted. - rollback pointer
DB_ROLL_PTR
:
It points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated.
How version(transcation id) works?
Every transaction is created along with a monotonically increasing ID (e.g. timestamps). In other words, the latest created transaction ID owns the largest ID.
Write
INSERT
: insert the new record(row) settingDB_TRX_ID
with current transaction IDDELETE
: do not remove the record, but updateDB_TRX_ID
of the record(row) to current transaction ID and mark it asdeleted
.UPDATE
: keep the original record(row) but mark it asdeleted
, then create a new record(row) with changed data and current transaction ID.
All write operations will add rollback segment to undo log and set the pointer in the changed record.
Snapshot Read(快照读)
SELECT
: transcation is only able to read the record whoseDB_TRX_ID
is <= current transaction ID and not in the ReadView.- ReadView: a transaction id list stands for all active(uncommitted) transactions at the moment of executing
Select
. Read View is a static view of active transactions id and created right before executing- the first
select
in transaction for REPEATED_READ - every
select
for READ_COMMITED
- the first
In REPEATED_READ level, static view protects the current transaction from the impact of the other. For example, even the active transaction commits after creating read view, its effect is ignored due to either a bigger ID or exsitence of ReadView. So snapshot read could prevent phantom read.
*Further Reading: Locking Read(当前读, 锁定读)
However database also supports the statement to read the latest committed values in the REPEATED_READ level,such as
- SELECT … LOCK IN SHARE MODE; # R(S) LOCK
- SELECT … FOR UPDATE; # X(W) LOCK
- In a transaction, SELECT … after
- INSERT INTO values … # X(W) LOCK
- DELETE FROM WHERE … # X(W) LOCK
- UPDATE SET … # X(W) LOCK
Modern database will add lock to avoid confict and the Lock Reading should wait for the lock release in
The Phantom read may happen during Locking Read if the database does not support range-based(next-key, gap) RW lock for Locking Read.
Relationship between MVCC and index
In short, MVCC is transparent to index. They are independent.
MVCC creates additional rows&columns to keep multiple versions of one row in the table at the same time, index regards all rows as valid rows so that it also keeps multiple index entries for different versions of a single row.