Franck Pachot for YugabyteDB

Posted on Nov 11, 2023

Triggers in SQL database to extend its declarative data logic (but not business logic)

#yugabytedb #distributed #sql #database

Database triggers have a bad reputation because, like any powerful feature, they can be abused. However, when used appropriately, triggers can be useful for adding additional data logic without changing the application’s business logic. Triggers combine two important features of SQL databases: the ability to execute code near the data, similar to stored procedures, and the ability to enable them declaratively without modifying the application code.

SQL database: from data ingest to queries

An SQL database is commonly used as a system of records. Its main purpose is to ingest data for future querying. This includes all writes, such as inserting new facts, updating existing information, and deleting data. A database could be seen as a stream of logs. The first action of an SQL database is to log all transaction intents and make them visible at commit.

However, querying this stream of logs isn’t efficient. Therefore, databases maintain data in structures that are easy to retrieve, partitioned, and sorted based on the key that will be used by point or range queries. Since various use cases require querying data from different perspectives, these structures are often redundant. Multiple indexes or materialized views may store the same data as the primary table but with different physical organizations.

This could be implemented with two purpose-built databases — one optimized for fast data ingestion and the other for read-only queries. This can be achieved using NoSQL, an event-streaming message broker, and a columnar database. However, SQL databases provide additional benefits such as ensuring data integrity.

When multiple applications and users make changes to a system of records, ensuring data integrity becomes crucial. One important access pattern to achieve this is by first reading before writing. This involves verifying the nonexistence of the primary key when inserting new rows and checking its existence when updating or deleting them. Additionally, referential integrity requires us to read before writing to validate foreign keys and avoid exposing incomplete data structures.

The model above, with two de-coupled databases, does not work anymore. The SQL database cannot simply persist a log of changes but maintains relational tables organized by their primary key for data consistency. Then, no need to stream to another database. Secondary indexes are maintained transparently, with a different physical organization and a subset of rows and columns, to optimize queries for different use cases. With Distributed SQL databases, this single logical database can scale out to be distributed and replicated.

What does this have to do with triggers? When users make requests to manipulate data in a database, additional data logic may be applied. Some of this logic is purely declarative and is handled by the database code, like maintaining secondary indexes or checking referential integrity. More complex ones may be customized with procedural code, which is what triggers are used for. Some databases implement referential integrity with internal triggers.

Here are some quick examples:

maintain a counter on each insert/update to avoid long select count(*) queries.
set a timestamp column on each update to query for the latest changes (with an additional secondary index on it)
write a history of previous values into another table for auditing purposes
Doing the same without triggers would bring more complexity to the application code, where the business logic will be interleaved with additional database-specific actions. Maintaining a redundant data store, like a counter or an aggregate, is difficult to scale. SQL databases provide ACID properties to do it without compromising data integrity when race conditions occur.

Good and bad use of triggers

When you want to maintain derived or redundant information, triggers are the way to go to do it consistently and scalable without adding more complexity to the application. Think of it as a custom index or materialized view for defining the action with procedural code on old and new values.

However, this code must stick to data logic and not business logic. Let’s take some examples to explain why.

An example: maintain a ‘last_update_timestamp’ column.

You add the column to the table, create a trigger that sets it to the current system timestamp on insert and update, backfill the existing columns with the timestamp you choose, and you are done. As your goal is to query it, you create an index on it. Without changing the application, you can now efficiently identify the rows modified in the past 24 hours or delete those not modified after a while.

This is the correct approach when this information belongs to the data logic. Knowing the last changes can be useful to replicate data elsewhere, invalidate a cache, and monitor the data evolution.

However, be careful if this information has a business meaning, like displaying it to users or taking specific actions. One day, an UPDATE will be run for another reason than the business use case you had in mind, like data migration or manually fixing some incorrect data. You may not want to update this timestamp for those non-business operations.

That’s the problem with triggers: they are triggered on all INSERT, UPDATE, or DELETE, independently of the business use case that originated with those SQL statements. If the action is specific to specific business cases of INSERT, UPDATE, or DELETE it must belong to the application code and not to a SQL trigger.

Other examples

Here are other examples of data logic where triggers make it easier to declare and test it once without changing the application code.

You may want to maintain an aggregate, a counter, or a last-value information in another table, to avoid long scans when querying this value: Most recent record (of many items) with YugabyteDB

You may want to maintain a reverse index in an existing table to find detailed information: Triggers & Stored Procedures for pure data integrity logic and performance

You may want to split one row into multiple buckets for better scalability: Avoiding hotspots in pgbench

You may define some referential integrity for which your database has no simple declaration: Global Unique Constraint on a partitioned table in PostgreSQL and YugabyteDB

You may temporarily maintain one value in two columns when your schema evolves: Retype a column in YugabyteDB (and PostgreSQL)

I’ve also used triggers for debugging purposes to log the old and new values during a database update that I wanted to track.

It is important to note that SQL triggers are a valid tool for OLTP applications as long as they do not affect the business logic. Triggers exist in SQL databases for this reason, and it is vital to choose a database that supports them to avoid missing out on their benefits. It is important to remember that DML statements may contain additional data logic in triggers, while the logic for business transactions should be implemented in the application code.

DEV Community

Triggers in SQL database to extend its declarative data logic (but not business logic)

SQL database: from data ingest to queries

Good and bad use of triggers

An example: maintain a ‘last_update_timestamp’ column.

Other examples

Top comments (0)

Read next

Database Monitoring: Importance, Tools, and Best Practices for Performance Optimization

Optimizing PostgreSQL Mass Deletions with Table Partitioning

Dynamic SQL in Oracle SQL | Best Explanation

AI Systematic Literature Review with KawanPaper