What’s distributed transaction?

The article describes some common algorithms to achieve the ACID transaction in Distributed System. Please review the following pages first if you are not familiar with some basic concepts, such as ACID, CAP, BASE.

The distributed transaction aims to achieve ACID among multiple databases.

In a distributed transaction environment, multiple processes participate in a transaction, each executing its own sub-transaction that can commit only if there is unanimous consensus by all participants to do so. Each system runs a transaction manager, a process that is responsible for participating in the commit algorithm algorithm to decide whether to commit or abort its sub-transaction. One of these transaction managers may be elected as the coordinator and initiates and runs the commit algorithm.

2-Phase Commit

This protocol requires a coordinator. The client contacts the coordinator and proposes a value. The coordinator then tries to establish the consensus among a set of processes (a.k.a Participants) in two phases, hence the name.

Workflow

Phase 1: Vote Phase
- (1) Coordinator sends a request (“can you commit?”) to every participant (reliably, retransmitting as often as needed until all replies are received)
- (2) Participants write WAL first to support rollback and run transactions without committing and
- (3) Phase 1 is complete when each participant responds.
Phase 2: Decision Phase
- (4) If the coordinator gets even a single abort response from a participant, it will send requests to all participants to abort the entire transaction. Otherwise, it will tell all participants to commit.
- (5) Participant gets the request to commit/rollback
- (6) Regard the transaction as finished if all decisions are done well.

Cons

it’s easy to implement but has several drawbacks since
EVERY STEP CAN FAIL!!

The two-phase commit protocol is a blocking protocol that relies on a fail-restart failure model.

Low Performance
- Synchronous blocking model: all transactions have to wait for the release of the locked resource
- Easy Rollback: All participants have to roll back when a participant says No.
Single-Point of failure(SPOF)
- If the coordinator crashes after (1) and before (4), all participants would hang forever
Inconsistent Data
- Some participants may not complete commit(5) due to network or self-crash. And the coordinator after (4), there is no one to record the status of participants even if the system elects a new coordinator.

3-Phase Commit

This is an extension of 2PC with the following two major optimizations:

Introduce timeout to both coordinator and participant
1. regard the timeout of participant response as No
2. regard the timeout of coordinator request as move on if participant is ready to commit.
Introduce one more phase
- Make sure all participants are able to run transcation to avoid some rollback
- enables the use of a recovery coordinator

If the participant is found to be in phase 2, that means that every participant has completed phase 1 and voted on the outcome. The completion of phase 1 is guaranteed. It is possible that some participants may have received commit requests (phase 3). The recovery coordinator can safely resume at phase 2.
If the participant was in phase 1, that means NO participant has started commits or aborts. The protocol can start at the beginning..
If the participant was in phase 3, the coordinator can continue in phase 3 – and make sure everyone gets the commit/abort request

Pros & Cons

Pros:
- reduce blocking waiting and the possibility of rollback to improve performance
- recovery coordinator to handle SPOF
Cons:
- Inconsistent data is still there due to (same as 2PC).

Distributed Transaction - 2PC, 3PC