DEV Community

Anh Trần Tuấn
Anh Trần Tuấn

Posted on • Originally published at tuanh.net on

Secrets Behind Deleting or Updating Records in PostgreSQL

1. Understanding PostgreSQL's MVCC (Multi-Version Concurrency Control)

Image

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to handle data consistency and isolation. Unlike databases that use locks for concurrency, MVCC allows multiple transactions to occur simultaneously without conflicts. Here's how it works:

1.1 What Happens Internally During an Update?

Image

When you update a record in PostgreSQL, the database doesn't immediately modify the existing data. Instead, it creates a new version of the record with the updated data while keeping the old version intact. This approach ensures that other transactions can still access the previous data version until they are committed or rolled back.

UPDATE employees SET salary = 60000 WHERE id = 5;
Enter fullscreen mode Exit fullscreen mode

This query creates a new version of the employee record with id = 5 having a salary of 60000. The old version remains in the database for any transactions that started before the update.

1.2 Benefits of MVCC in Updates

  • Concurrent Access : Transactions can run concurrently without blocking each other, providing better performance in multi-user environments.
  • Consistency : Ensures data consistency for all ongoing transactions, as they only see the data state when they started.

1.3 Drawbacks of MVCC in Updates

  • Increased Storage : Each update creates a new row version, which can lead to increased storage requirements.
  • Vacuuming Overhead : Old versions of rows remain in the database, requiring periodic cleanup using the VACUUM operation to free up space and prevent table bloat.

2. The Mechanics of Deleting a Record

Deleting a record in PostgreSQL follows a similar approach to updating. When a record is deleted, it isn't immediately removed from the disk. Instead, it is marked as "dead" and becomes invisible to future transactions.

2.1 What Happens Internally During a Delete?

When a DELETE command is executed, PostgreSQL marks the tuple (row) as "dead," and it is left for a future vacuuming process to physically remove it from the disk.

DELETE FROM employees WHERE id = 5;
Enter fullscreen mode Exit fullscreen mode

The row with id = 5 is marked as dead. Until a vacuum is performed, this space is not reclaimed.

2.2 Benefits of This Approach

  • Transaction Safety : The delete operation is transaction-safe, meaning that it will only be visible to transactions that occur after the commit.
  • Quick Deletes : Since no immediate disk operation occurs, deletes are generally faster.

2.3 Drawbacks of This Approach

  • Table Bloat : Like updates, deletes can lead to table bloat, where dead rows accumulate, reducing query performance over time.
  • Maintenance Overhead : Requires regular VACUUM and ANALYZE operations to manage space and keep the database optimized.

3. Performance

3.1 Without Indexes

  • When there are no indexes on the columns involved in DELETE or UPDATE operations, PostgreSQL must perform a sequential scan. This means it scans every row in the table to find the rows to delete or update. As a result, these operations can be very slow, especially for large tables.
  • Moreover, both DELETE and UPDATE generate dead tuples in PostgreSQL. After a DELETE operation, the rows are not physically removed but marked as dead. Similarly, an UPDATE operation is treated as a combination of DELETE and INSERT , which leaves behind dead tuples. Without regular VACUUM processes, these dead tuples accumulate, leading to increased table size and reduced performance.

3.2 With Indexes:

  • When indexes are present on the columns involved in DELETE or UPDATE operations, PostgreSQL uses the index to quickly locate the relevant rows, significantly reducing the time required. However, updating or deleting indexed rows requires additional work to maintain the indexes, which can increase the cost of these operations.
  • For DELETE , every index associated with the table must be updated to remove the index entries corresponding to the deleted rows.
  • For UPDATE , if the indexed columns are updated, the index itself must be modified, which adds overhead. If the updated data does not fit in the same page, PostgreSQL may need to allocate new pages, leading to potential page splits and additional I/O.

3.3 Why Can Indexed Queries Be Slower

Cost of Maintaining Indexes:

  • Every time a row is inserted, deleted, or updated, PostgreSQL must also update the relevant indexes. For DELETE and UPDATE operations, this can mean significant overhead if there are multiple indexes on the table.
  • For large batch operations (bulk DELETE or UPDATE ), the cost of maintaining indexes can outweigh the benefits provided by them, causing a slowdown.

Index Bloat:

  • Over time, as INSERT , UPDATE , and DELETE operations are performed, indexes can become bloated with dead entries. This bloat makes the index less efficient, leading to slower search times and higher maintenance costs.
  • Regular maintenance through VACUUM and REINDEX commands is necessary to prevent index bloat and ensure optimal performance.

Query Planner and Cost Estimation

  • PostgreSQL’s query planner decides whether to use an index or not based on a cost estimation model. In some cases, the planner may choose not to use an index even when one is available, if it estimates that a sequential scan would be more efficient (e.g., when a query is expected to return a large percentage of the rows).
  • If the statistics are outdated or not accurate, the query planner may make suboptimal decisions, resulting in slower queries.

4. Optimizing PostgreSQL Performance When Deleting or Updating

Understanding the implications of deletes and updates can help you optimize your PostgreSQL database for better performance and efficiency.

4.1 Strategies for Handling Frequent Updates

Use VACUUM Regularly: Regularly schedule VACUUM operations to clean up dead rows and prevent table bloat.

VACUUM employees;
Enter fullscreen mode Exit fullscreen mode

Employ VACUUM FULL When Necessary: In cases of severe bloat, use VACUUM FULL to reclaim space, although it locks the table for its duration.

4.2 Techniques for Managing Deletes

Partition Large Tables : Partitioning can help manage large datasets by splitting them into more manageable chunks, reducing the impact of deletes.

Use RETURNING Clause : When deleting records, use the RETURNING clause to get feedback about the rows being deleted, which can be helpful for auditing purposes.

DELETE FROM employees WHERE salary < 20000 RETURNING *;
Enter fullscreen mode Exit fullscreen mode

5. Conclusion

While deleting or updating records in PostgreSQL may seem like simple operations, understanding the underlying processes is crucial for database optimization. The benefits of MVCC, such as concurrent access and consistency, come with trade-offs like storage overhead and maintenance complexity. By employing the right strategies and being aware of PostgreSQL’s mechanics, you can maximize your database’s performance and maintain data integrity.

Feel free to comment below if you have any questions or need further clarification on how these processes work in PostgreSQL.

Read posts more at : Secrets Behind Deleting or Updating Records in PostgreSQL

Top comments (0)