Rishabh Rawat

Posted on Feb 5, 2022 • Originally published at rrawat.com

How To Use PostgreSQL Triggers for Automated Processing of Data

#database #postgres #programming #tutorial

Database triggers are a way to run a piece of code when a predefined operation occurs on the database. You can relate to triggers as event handlers in javascript. In javascript, you can set up an event handler for a button click. Database triggers are just like that.

In this article, we'll go through the basics of setting up triggers in PostgreSQL and a few examples to see how they can be used. Triggers can be created for data changes or database events. We'll be looking at data changes i.e., triggers that run BEFORE/AFTER INSERT, UPDATE, DELETE operations on a table.

The article is divided into the following sections:

What is a trigger
Types of triggers
Trigger function
2 steps to write a trigger in PostgreSQL
Data visibility in triggers
Examples
Drop trigger
Conclusion

What is a trigger

Taking from the docs, a trigger is a specification that the database should automatically execute a particular function whenever a certain type of operation is performed.
We can use triggers on tables (partitioned or not), views, and foreign tables.

Triggers can be defined to execute either before or after any INSERT, UPDATE, or DELETE operation, either once per modified row, or once per SQL statement.

A trigger must have a trigger function that runs whenever a supported operation occurs. Creating a trigger includes providing the trigger function. It takes no arguments and returns the type trigger.

NOTE: The trigger function receives its input arguments through a particular variable inbuilt into PostgreSQL. We don't need to pass any function arguments in the definition.

Types of triggers

There are a lot variations in how a trigger gets invoked. Let's look at them:

Row level triggers
Statement level triggers

Row-level triggers run for each row in the table whereas statement-level triggers run once per transaction.
For example, if a single transaction updates a thousand rows then the statement-level trigger will only run once but the row-level trigger will run a thousand times.

Triggers are also classified according to whether they fire before, after, or instead of the operation. These are referred to as BEFORE triggers, AFTER triggers, and INSTEAD OF triggers respectively.

For per-row triggers:

The return value is ignored for row-level triggers fired AFTER an operation so they can return NULL.
BEFORE level trigger can return NULL to skip the operation for the current row. This instructs the executor to not perform the row-level operation that invoked the trigger (the insertion, modification, or deletion of a particular table row).
For row-level BEFORE INSERT/UPDATE triggers, the returned row becomes the row that will get inserted or updated. It allows the trigger function to modify the row being inserted or updated.
Trigger functions invoked by per-statement triggers should always return NULL.

NOTE: Triggers in a transaction execute in the same transaction (BEFORE or AFTER). Doing a rollback for those X transactions means all the trigger changes will be undone in the same X transactions.

Trigger function

A trigger function is created with the CREATE FUNCTION command, declaring it as a function with no arguments and a return type of trigger for data change triggers.
Special local variables named TG_something are automatically defined to describe the condition that triggered the call.

Trigger functions can be written in most of the procedural languages including PL/pgsql, Perl, Python.
With PL/pgSQL function as a trigger, there are some variables automatically created at the top-level block.

Note that the function must be declared with no arguments even if it expects to receive some arguments specified in CREATE TRIGGER.
Such arguments are passed via TG_ARGV.

We'll take a look at some of the common ones:

NEW

Holds the new database row for INSERT/UPDATE operations in row-level triggers. This variable is null in statement-level triggers and for DELETE operations.

OLD

Holds the old database row for UPDATE/DELETE operations in row-level triggers. This variable is null in statement-level triggers and for INSERT operations.

TG_WHEN

A string of BEFORE, AFTER, or INSTEAD OF, depending on the trigger's definition.

TG_LEVEL

Indicates whether trigger is ROW or STATEMENT level.

TG_OP

Shows the operation that triggered the trigger. Can be INSERT, UPDATE, DELETE, or TRUNCATE.

TG_ARGV[]

The arguments from the CREATE TRIGGER statement. The index counts from 0. Invalid indexes (less than 0 or greater than or equal to tg_nargs) result in a null value.

2 steps to write a trigger in PostgreSQL

A fully functioning trigger in PostgreSQL can be created in just two steps:

Define a trigger function
Create the trigger using that trigger function

We'll create a trigger function first which can be used by a trigger:

CREATE OR REPLACE FUNCTION function_name()
  RETURNS trigger
  LANGUAGE plpgsql
AS $$
DECLARE
  -- declare variables if needed
BEGIN
  -- function body goes here
END;
$$;

Now that we have the trigger function, we can create the trigger:

CREATE OR REPLACE TRIGGER name
  { BEFORE | AFTER } { event }
  ON table_name
  [ FOR [ EACH ] { ROW | STATEMENT } ]
  [ WHEN ( condition ) ]
  EXECUTE { FUNCTION | PROCEDURE } function_name ( arguments )

CREATE OR REPLACE TRIGGER will either create a new trigger or replace an existing trigger. If you don't want that, use CREATE TRIGGER instead. It won't create the trigger if it already exists.

The trigger will be associated with the specified table and will execute the specified trigger function when certain operations are performed on that table BEFORE/AFTER the operation has happened.

NOTE: The same trigger function can be used to create multiple triggers if written properly.

You can also put conditions on when the trigger should fire using the WHEN clause.
In row-level triggers, the WHEN condition can examine the old and/or new values of columns of the row.
It is not useful in the case of statement-level triggers because they cannot refer to any values in the table.

NOTE: If multiple triggers of the same kind are defined for the same event, they will be fired in alphabetical order by name.

Some use-cases for different triggers

Row-level BEFORE triggers are used for checking or modifying the data that will get inserted or updated.
For example, a BEFORE trigger might get used to insert the current time into a timestamp column or to check that two elements of the row are consistent. It can also act as a validation stage where you reject the unwanted data (eg. null checks).

Row-level AFTER triggers are often used to propagate the updates to other tables or make consistency checks against other tables.
Reason? AFTER trigger always sees the final value of the row while a BEFORE might not. There might be other BEFORE triggers firing after it.

If possible, using the BEFORE trigger is better from a performance standpoint. We shouldn't save the information about the operation until the end of a statement if not needed.

NOTE: If a trigger function executes SQL commands then these commands might fire triggers again. This is known as cascading triggers. There is no direct limitation on the number of cascade levels.
The Cascades can cause a recursive invocation of the same trigger.

For example, an INSERT trigger might execute a command that inserts an additional row into the same table, causing the INSERT trigger to be fired again.
It is the trigger programmer's responsibility to avoid infinite recursion in such scenarios.

Data visibility in triggers

There are many use-cases for triggers and accessing the new or old values of a row in a table is one of them.
The ability to look into the data allows us to perform various actions that wouldn't be possible otherwise.
Audit logs, check constraints, and version history are some of the common and straightforward use-cases.

Keeping this in mind, it is important to know when we can access the old and new data of a row in the table.

In a statement-level trigger, none of the changes are visible to BEFORE triggers, whereas all changes are visible to AFTER triggers.
Any modifications to the data that happen during INSERT, UPDATE, DELETE is not visible to BEFORE level triggers because the change hasn't happened yet.
BEFORE trigger can see the effects of previous runs of BEFORE triggers.
All the changes are visible to AFTER triggers because the change has already happened.

Examples

Let's take a look at some examples showing different use cases of triggers. These are very straightforward examples, but they are not the only use cases.

Checking constraints

We can set a check constraint on a table to ensure that certain column values are not null or have certain values.
This can be achieved with the help of a BEFORE INSERT trigger.

CREATE OR REPLACE FUNCTION name_null_checker() RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
    IF NEW.name IS NULL THEN
        RAISE EXCEPTION 'Name cannot be null';
    END IF;
    RETURN NEW;
END;
$$;

We have written a basic trigger function that checks if the name of the book is null and throws an exception if it is.
With this step, we have created a re-usable trigger function that can be used by multiple triggers to check the name constraint.
Properly written, this trigger function would be independent of the specific table it is triggering on.

Now, let's create a trigger that will fire BEFORE an INSERT operation on the books table.

CREATE TRIGGER name_null_checker_trigger BEFORE INSERT ON books
FOR EACH ROW EXECUTE FUNCTION name_null_checker();

And now we have a basic trigger in place to perform constraint check on books table.

Audit logging

Audit logs are a way to keep track of changes to a table. We will set up a trigger to log all INSERT, UPDATE and DELETE on the books table.

We need to create a books audit table to store the audit logs. Let's first create it:

CREATE TABLE IF NOT EXISTS books_audit_store (
  name VARCHAR(128) UNIQUE NOT NULL,
  price float(2) NOT NULL,
  rating INT NOT NULL,
  operation VARCHAR(64) NOT NULL,
  modified_at TIMESTAMP NOT NULL
);

Now that we have our books_audit_store table we can create the trigger function which will perform the INSERT SQL command:

CREATE OR REPLACE FUNCTION audit_logger()
  RETURNS trigger
  LANGUAGE plpgsql
AS $$
BEGIN
  INSERT INTO books_audit_store (name, price, rating, operation, modified_at) 
    VALUES (NEW.name, NEW.price, NEW.rating, TG_OP, now());
  RETURN NEW;
END;
$$;

Finally, creating the trigger on books_audit_store table:

CREATE TRIGGER audit_logger_trigger
AFTER UPDATE ON books
FOR EACH ROW
WHEN (OLD.* IS DISTINCT FROM NEW.*)
EXECUTE FUNCTION audit_logger();

Here's the books table before update:

We've updated the "Scrum" book title to be "Scrum master":

As a result of the update on books table with a different value, we got our audit log entry in books_audit_store table:

Drop trigger

A trigger can be dropped by issuing a DROP TRIGGER statement.

DROP TRIGGER triggerName ON tableName;

To drop the trigger, the user must be the owner of the table on which the trigger is defined.

We can add IF EXISTS to the statement to drop the trigger only if it exists:

DROP TRIGGER IF EXISTS triggerName ON tableName;

NOTE: In the SQL standard, trigger names are not local to tables, so you don't need to specify the table name in the DROP TRIGGER statement.

Conclusion

This is just the introduction to triggers and how you can use them with functions to create post processors, validators, and audit loggers.
The possibilities are endless, going through the official docs of CREATE TRIGGER would be a good next step. It has some important notes and examples which you can explore.

I'd be writing about more extensive applications of triggers once this information fits better with my current understanding of databases. Just connecting the dots would be a good start.

DEV Community