DEV Community

Cover image for Ensuring Data Integrity in Multi-User Environments
Drimil Mendapara
Drimil Mendapara

Posted on • Originally published at blue-tornado-e20.notion.site

Ensuring Data Integrity in Multi-User Environments

What is in Concurrency Control?

Concurrency Control in databases refers to the methods and techniques used to ensure that multiple transactions can occur simultaneously without leading to data inconsistencies, corruption, or loss.

A database handles more than one transaction at the same time, and some transactions may read and update the same data. This can lead to issues like dirty reads and write-write conflicts. To manage this, databases use techniques known as concurrency control techniques.

  1. Lock-Based Concurrency Control
  2. Timestamp-Based Concurrency Control
  3. Multiversion Concurrency Control (MVCC)

Lock Based Concurrency Control

In this technique, transactions use a locking mechanism to update or modify data in the database. This method helps the database prevent conflicts and maintain consistency

Image description

We basically use two types of locks for any database resources.

  • Shared Locks (S): Allow multiple transactions to read (or acquire a shared lock on) the same resource but prevent any transaction from modifying it.
  • Exclusive Locks (X): Allow a single transaction to both read and modify a resource. When a transaction holds an exclusive lock, no other transaction can acquire any type of lock (shared or exclusive) on that resource.

A shared lock is used when a transaction reads some data. During this time, the transaction acquires a shared lock, and more than one transaction can acquire a shared lock on the same database resource. However, when a transaction wants to update data, it acquires an exclusive lock. This lock is used when a transaction modifies or updates data. When one transaction acquires an exclusive lock, other transactions cannot acquire either a shared or an exclusive lock on that resource.

We use two type of method into lock base concurrency control

  • Two-Phase Locking (2PL): A protocol where transactions acquire all the necessary locks before releasing any of them. It consists of two phases:
    • Growing Phase: The transaction acquires locks but does not release any.
    • Shrinking Phase: The transaction releases locks and cannot acquire any new locks.
  • Strict Two-Phase Locking (Strict 2PL): A stricter form of 2PL where transactions must hold all their locks until they commit or abort.

We have to consider trade-offs between concurrency and locking. In lock-based concurrency control, a transaction may lock an entire page or table of the resource. While this approach reduces the number of locks the database has to manage, it also results in lower concurrency. This type of lock is called Coarse-Grained Locking. Alternatively, we can lock only the specific row that the transaction is used. Although this approach requires the database to manage more locks, it allows for higher concurrency, and is referred to as Fine-Grained Locking. Depending on our use case, we should choose the appropriate technique.

In this concurrency technique, a deadlock situation can occur when one transaction acquires a resource that is needed by another transaction, and the other transaction acquires a resource that is needed by the first transaction. In such cases, a deadlock occurs. To resolve this, the database uses deadlock prevention and deadlock detection techniques.

  • Dead Lock Prevention

    We have two techniques Wait-Die and Wound-Wait Schemes

    • Wait-Die: If an older transaction requests a lock held by a younger transaction, it waits (wait). If a younger transaction requests a lock held by an older transaction, the younger transaction is aborted (die).
    • Wound-Wait: If an older transaction requests a lock held by a younger transaction, the younger transaction is aborted (wound). If a younger transaction requests a lock held by an older transaction, it waits (wait).
  • Dead Lock Detection

    The database system builds a graph where each node represents a transaction. An edge from T1 to T2 indicates that T1 is waiting for a lock held by T2. If this graph contains a cycle, a deadlock is detected.

  • Deadlock Resolution

    Once a deadlock is detected, the system resolves it by rolling back one or more transactions. The chosen transaction is known as the "victim.”

    The selection of a victim can be based on various criteria

    • Minimum Cost: The transaction that has made the least progress (e.g., has executed the fewest statements) is rolled back.
    • Lowest Priority: Transactions are assigned priorities, and the one with the lowest priority is rolled back.

    After rolling back a transaction, it can be retried either immediately or after a certain delay.

Timestamp-Based Concurrency Control

Timestamp-Based Concurrency Control (TBCC) is a method used in database systems to ensure that transactions are executed in a serialisable manner without using locks. Instead of relying on locks, TBCC assigns a unique timestamp to each transaction and uses these timestamps to enforce the order of transaction execution.

Image description

In this technique, when a transaction is incoming, we assign a unique timestamp to that transaction. If more than one transaction arrives at the same time, the system ensures that each transaction receives a unique timestamp.

The database data has two types of timestamps that help handle concurrency in transactions.

  • Read Timestamp (RTS): The RTS of a data item is the largest timestamp of any transaction that has successfully read that data item.
  • Write Timestamp (WTS): The WTS of a data item is the largest timestamp of any transaction that has successfully written to that data item.

Operation Rules in Timestamp-Based Concurrency Control

In TBCC, when a transaction tries to perform a read or write operation on a data item, the system compares the transaction’s timestamp with the RTS and WTS of the data item to decide whether the operation is allowed, aborted, or delayed.

  1. Read Operation

    When transaction Ti try to access resource X with timestamp TS

    Condition 1: TS ≤ WTS(X): The transaction is reading outdated data, meaning another transaction with a later timestamp has already written to X. then Ti is abort and rollback.

    Condition 2: TS ≥ WTS(X): The transaction is allowed to read the data. The RTS of X is updated with TS timestamp (RTS(X) = TS ).

    Example

    Transaction T1 (with timestamp 5) writes to data item X, setting WTS(X) = 5

    Transaction T2 (with timestamp 3) tries to read X. Since TS(T2) is less than WTS(X), T2 is aborted.

  2. Write Operation

    When transaction Ti try to write resource X with timestamp TS

    Condition 1: TS < RTS(X): The write operation would violate the order of operations, as a transaction with a later timestamp has already read X. then Ti is abort and rollback.

    Condition 2: TS < WTS(X): The write operation is ignored because the data item X has already been written by a newer transaction.

    Condition 3: TS ≥ WTS(X) and RTS(X): The write operation is allowed. The WTS of X is updated to TS(Ti)

    Example

    Transaction T1 (with timestamp 7) reads data item X, setting RTS(X) = 7

    Transaction T2 (with timestamp 5) tries to write to X, TS (T2) is less than RTS(X) so transaction is aborted and rollback

    Transaction T3 (with timestamp 8) tries to write to X, TS (T3) is grater then RTS(X) and WTS(X) so transaction is allow to write or update

Advantage

  • TBCC does not use locks, so there is no risk of deadlocks occurring.
  • TBCC can be simpler to implement compared to some lock-based protocols because it avoids the complexities of lock management and deadlock resolution.

Disadvantages

  • TBCC may result in a higher number of transaction aborts, especially in systems with high contention, as transactions can frequently violate the timestamp order.
  • TBCC requires maintaining additional information (RTS and WTS) for each data item, leading to higher storage and computational overhead.
  • Certain transactions, especially those with earlier timestamps, may be repeatedly aborted if they conflict with many other transactions, leading to starvation.

Multiversion Concurrency Control (MVCC)

In this technique, the database maintains versions of the data to handle concurrency. When a transaction updates or adds data, the version of the data changes. This version is maintained in the database and used to manage concurrency.

Image description

There are some key concepts which use into MVCC

Versions

  • Data Item Versions: Every data item in the database can have multiple versions. Each version represents the state of the data item at a particular point in time.
  • Transaction Timestamps: Each transaction is assigned a unique timestamp or transaction ID (TxID) when it begins. This timestamp is used to identify which version of the data a transaction should see and modify.

Visibility Rules

  • Readers: A transaction can only see the version of a data item that was committed before the transaction began. This ensures that the transaction operates on a consistent snapshot of the database.
  • Writers: When a transaction updates a data item, it creates a new version of that data item rather than modifying the existing version. This new version is not visible to other transactions until the transaction commits.

Commit and Garbage Collection

  • When a transaction commits, its changes (new versions of data items) become visible to other transactions. Old versions that are no longer needed (i.e., versions that are not visible to any active transaction) can be garbage collected or deleted.

How MVCC Works

Scenario

  • Initial Balance: $1000
  • Transaction T1 (TS1): Reads the balance.
  • Transaction T2 (TS2): Updates the balance to $1200.
  • Transaction T3 (TS3): Reads the balance.
  • Transaction T4 (TS4): Updates the balance to $1500.

Step-by-step ordering

  • Initial State:
    • The balance is $1000.
    • One version exists in the database:
      • Balance: $1000, Timestamp: TS=0
  • Transaction T1 (TS1):
    • T1 starts with a start timestamp TS1.
    • T1 reads the balance. Since no other transactions have updated it yet, T1 sees the version of the balance with TS=0, which is $1000.
    • T1 does not make any changes and continues with its operations.
  • Transaction T2 (TS2):
    • T2 starts with a start timestamp TS2.
    • T2 updates the balance to $1200.
    • The database creates a new version of the balance with the value $1200 and tags it with T2’s start timestamp (TS2).
    • The previous version (balance = $1000, TS=0) is still retained for transactions that started before T2.
  • Transaction T3 (TS3):
    • T3 starts with a start timestamp TS3.
    • T3 attempts to read the balance. Since T3’s start timestamp is greater than T2’s timestamp but T2 has not yet committed, T3 sees the most recent committed version of the balance, which is still $1000 (TS=0).
    • T3 does not see the new version created by T2 because T2 has not committed yet.
  • Transaction T4 (TS4):
    • T4 starts with a start timestamp TS4.
    • T4 updates the balance to $1500.
    • The database creates another new version of the balance with the value $1500 and tags it with T4’s start timestamp (TS4).
    • The versions in the database are now:
      • Balance: $1000, TS=0
      • Balance: $1200, TS=TS2 (uncommitted)
      • Balance: $1500, TS=TS4 (uncommitted)
  • Commit Order:
    • T2 Commits: When T2 commits, the version with $1200 becomes visible to subsequent transactions. The versions are now:
      • Balance: $1000, TS=0 (for older transactions)
      • Balance: $1200, TS=TS2 (newly committed)
      • Balance: $1500, TS=TS4 (uncommitted)
    • T4 Commits: When T4 commits, its version with $1500 becomes visible. The versions are now:
      • Balance: $1000, TS=0 (for older transactions)
      • Balance: $1200, TS=TS2
      • Balance: $1500, TS=TS4 (newly committed)
  • Post-Commit State:
    • If any new transaction starts after both T2 and T4 have committed, it will see the balance of $1500.

Benefits Of MVCC

  • Non-Blocking Reads: Both T1 and T3 read the balance without waiting for T2 and T4 to commit, avoiding delays.
  • Consistent Views: Each transaction operates on a consistent snapshot of the database. T1 and T3 see the initial balance, while T2 and T4 see updated values when they read after writing.
  • No Locks Needed: Transactions proceed without locking the data, reducing contention and allowing more transactions to proceed in parallel.

Disadvantages of MVCC

Increased Storage Overhead Keeping multiple versions of data items requires additional storage. Although garbage collection helps mitigate this, it still increases the complexity and resource usage of the database.

Complexity in Implementation The implementation of MVCC is more complex than traditional locking mechanisms, which can lead to higher maintenance and debugging costs.

Potential for Long-Running Transactions to Cause Issues Long-running transactions can see outdated data for extended periods, potentially leading to less timely or relevant reads.

Practical Examples

  • Lock Base Concurrency Control
    • When multiple transactions frequently update the same set of data, lock-based concurrency control can effectively prevent conflicts.
    • If the application requires strong consistency guarantees, locks ensure that only one transaction can modify a data item at a time.
    • Where transactions involve complex operations that need to be executed atomically, locking ensures that other transactions cannot interfere until the transaction is complete.
  • Timestamp-Based Concurrency Control
    • When transactions frequently read and write the same data, TBC ensures that transactions are executed in a logical order based on their timestamps, reducing the chances of conflicts.
    • Where transactions must be processed in a strict order to maintain consistency. TBC ensures that the order of transactions reflects the order in which they arrive.
    • Where transactions can be easily rolled back and retried if they conflict with others.
  • Multiversion Concurrency Control
    • In applications where reads significantly outnumber writes, MVCC allows reads to proceed without blocking, improving overall performance.
    • When applications need to ensure that transactions operate on a consistent snapshot of the database, MVCC allows transactions to see a consistent view of the data as it was when they started.
    • Where multiple users interact with the database simultaneously, and the system needs to support high levels of concurrency without sacrificing performance.

Summary Of Use-Case

  • Lock-Based Concurrency Control: Best for scenarios with high write contention, strict consistency needs, and complex transactions.
  • Timestamp-Based Concurrency Control: Ideal for real-time systems, applications with high read-write conflicts, and scenarios requiring strict transactional order.
  • Multi-version Concurrency Control (MVCC): Suited for read-heavy workloads, systems needing snapshot isolation, and high-concurrency environments like OLTP systems.

Conclusion

We explored various concurrency control techniques, including Lock-Based Concurrency Control, Timestamp-Based Concurrency Control, and Multiversion Concurrency Control (MVCC). Detailed explanations and examples were provided to demonstrate how each method handles simultaneous transactions, particularly focusing on scenarios with multiple reads and writes.

Top comments (0)