DEV Community

Cover image for PostgreSQL lock trees
Bolaji Wahab
Bolaji Wahab

Posted on • Updated on

PostgreSQL lock trees

There are two forms of locks in PostgreSQL, row level and object level. While most operations would simply take row level locks, migrations usually take the more aggressive ones (object level locks).

PostgreSQL keeps a queue of locks, with first in first out (FIFO). While readers do not block writers, often times migration ends up blocking several other operations on the database leading to lock contention. This is due to the FIFO nature of the lock queue, there is no jumping the queue.
Imagine a migration which needs to alter a table (requiring access exclusive lock on the table), it blocks other operations even those doing ordinary SELECT. Also, the migration might have to wait to take the necessary lock if there are conflicting locks on the table leading to lock contention.

Often times when there is a lock contention, we end up with questions such as what (human or application) is the main root of the lock contention.

Here is an example of other people's work on the subject which you might find helpful:
postgresql-lock-trees by Nikolay Samokhvalov.

After experimenting with the above example and the drawback being the call on the function pg_blocking_pids(int), I started looking at how we can simply do away with calling the function.
Then I rewrote the query, relying on lock conflicts as listed on the documentation Conflicting Lock Modes, pg_locks and pg_stat_activity.

The query:

WITH conflicts(lock, conflict) AS (VALUES
    ('AccessShare', 'AccessExclusive'), ('RowShare', 'Exclusive'),
    ('RowShare', 'AccessExclusive'), ('RowExclusive', 'Share'),
    ('RowExclusive', 'ShareRowExclusive'), ('RowExclusive', 'Exclusive'),
    ('RowExclusive', 'AccessExclusive'), ('ShareUpdateExclusive', 'ShareUpdateExclusive'),
    ('ShareUpdateExclusive', 'Share'), ('ShareUpdateExclusive', 'ShareRowExclusive'),
    ('ShareUpdateExclusive', 'Exclusive'), ('ShareUpdateExclusive', 'AccessExclusive'),
    ('Share', 'RowExclusive'), ('Share', 'ShareUpdateExclusive'),
    ('Share', 'ShareRowExclusive'), ('Share', 'Exclusive'),
    ('Share', 'AccessExclusive'), ('ShareRowExclusive','RowExclusive'),
    ('ShareRowExclusive', 'ShareUpdateExclusive'), ('ShareRowExclusive', 'Share'),
    ('ShareRowExclusive', 'ShareRowExclusive'), ('ShareRowExclusive', 'Exclusive'),
    ('ShareRowExclusive', 'AccessExclusive'), ('Exclusive', 'RowShare'),
    ('Exclusive', 'RowExclusive'), ('Exclusive', 'ShareUpdateExclusive'),
    ('Exclusive', 'Share'), ('Exclusive', 'ShareRowExclusive'),
    ('Exclusive', 'Exclusive'), ('Exclusive', 'AccessExclusive'),
    ('AccessExclusive', 'AccessShare'), ('AccessExclusive', 'RowShare'),
    ('AccessExclusive', 'RowExclusive'), ('AccessExclusive', 'ShareUpdateExclusive'),
    ('AccessExclusive', 'Share'), ('AccessExclusive', 'ShareRowExclusive'),
    ('AccessExclusive','Exclusive'), ('AccessExclusive','AccessExclusive')
),
activity AS (
    SELECT
        datname,
        pid,
        usename,
        application_name,
        client_addr,
        xact_start,
        query_start,
        state_change,
        backend_xid,
        backend_xmin,
        state,
        wait_event_type,
        wait_event,
        query
    FROM pg_stat_activity
    WHERE pid != pg_backend_pid()
),
locks AS (
   SELECT
        pid,
        locktype,
        granted,
        relation::regclass AS relation,
        left(mode,-4) AS mode,
        ROW(locktype,database,relation::regclass,page,tuple,virtualxid,transactionid,classid,objid,objsubid) AS obj
    FROM pg_locks
),
pairs AS (
    SELECT
        w.pid AS waiter,
        l.pid AS locker,
        l.obj,
        l.mode,
        l.relation
    FROM locks AS w
    JOIN locks AS l ON l.obj IS NOT DISTINCT FROM w.obj AND l.locktype = w.locktype AND NOT l.pid = w.pid AND l.granted
    WHERE NOT w.granted
),
leaders AS (
    SELECT
        DISTINCT(locker),
        coalesce(relation, (select max(relation) from locks where pid = locker)) AS relation,
        mode
    FROM (SELECT *, rank() OVER (PARTITION BY locker ORDER BY relation NULLS FIRST) AS pos FROM pairs) AS ss
    WHERE pos = 1
),
lock_queue AS (
    SELECT
        array_agg(l.pid::text || '.' || l.mode order by p.query_start) AS queue,
        rel.relation
    FROM locks AS l
    JOIN activity p ON p.pid = l.pid
    JOIN activity a ON a.pid = l.pid,
    LATERAL (SELECT coalesce(relation, max(relation)) AS relation FROM locks WHERE pid = l.pid AND relation IS NOT NULL GROUP BY relation) AS rel
    WHERE NOT l.granted
    GROUP BY rel.relation
),
push_root_on_lock_queue AS (
    SELECT
        l.relation,
        array_prepend(p.locker::text || '.' || p.mode, queue) AS queue
    FROM lock_queue AS l
    JOIN leaders AS p ON l.relation = p.relation
),
lock_queue_element_num AS (
    SELECT
        relation,
        a.element,
        a.num
    FROM push_root_on_lock_queue, unnest(queue) WITH ORDINALITY a(element, num)
),
main AS (
    SELECT
        a.relation,
        split_part(a.element, '.', 1) AS prev_pid,
        split_part(a.element, '.', 2) AS prev_mode,
        split_part(b.element, '.', 1) AS pid,
        split_part(b.element, '.', 2) AS mode,
        a.num AS start,
        b.num AS next,
        CASE
            WHEN EXISTS (SELECT * FROM conflicts AS c WHERE c.lock = split_part(a.element, '.', 2) AND c.conflict = split_part(b.element, '.', 2)) THEN true
            ELSE false
        END AS conflict
    FROM lock_queue_element_num AS a
    JOIN lock_queue_element_num b ON a.relation = b.relation
    WHERE a.num < b.num
    ORDER BY b.num DESC
),
tree AS (
    SELECT
        DISTINCT ON (pid, prev_pid)
        *,
        dense_rank() OVER (PARTITION BY relation ORDER BY start) - 1 AS level
    FROM main
),
stat AS (
    SELECT
        m.pid,
        p.locker,
        array_agg(DISTINCT m.prev_pid || ':' || m.level) AS blocked_by
    FROM pairs AS p
    JOIN tree AS m ON p.waiter = m.pid::int
    WHERE conflict
    AND m.pid != m.prev_pid
    GROUP BY m.pid, p.locker
),
/* sometimes we can miss some blockers in pairs due to the lock path, the query should be fix normally but for now this is workaround
   TODO: fix pairs query to include all path
*/
sub AS (
    SELECT
        pidd,
        count(pidd) AS blocked_cnt
    FROM (
        SELECT DISTINCT ON (pid) split_part(unnest(blocked_by), ':', 1) AS pidd
        FROM stat AS s
    ) AS blocked_cnt
    WHERE NOT EXISTS (SELECT FROM pairs AS p WHERE p.locker = blocked_cnt.pidd::int)
    GROUP BY pidd
),
result AS (
    /* add details of waiters */
    SELECT
        DISTINCT(s.pid::int),
        s.blocked_by,
        (clock_timestamp() - a.xact_start)::interval(0) AS ts_age,
        (clock_timestamp() - a.state_change)::interval(0) AS change_age,
        CASE
            WHEN s.blocked_by = '{}' THEN NULL::interval(0)
            ELSE (clock_timestamp() - a.query_start)::interval(0)
        END AS blocking_age,
        a.backend_xid AS xid,
        a.backend_xmin AS xmin,
        replace(a.state, 'idle in transaction', 'idletx') AS state,
        a.datname,
        a.usename,
        a.client_addr,
        a.application_name,
        a.wait_event_type || ':' || a.wait_event AS wait,
        blocked.cnt AS blocked_cnt,
        CASE
            WHEN s.blocked_by = '{}' THEN true
            ELSE false
        END AS root,
        trim(trailing ';' from format(
            '%s %s%s',
            '[' || s.pid::text || ']',
            coalesce(repeat('.', array_length(s.blocked_by, 1)+1), '') || ' ',
            left(query, 1000)
        )) AS query
    FROM stat AS s
    JOIN activity AS a ON s.pid::int = a.pid,
    LATERAL (
        SELECT
            coalesce((CASE
                WHEN count(p.waiter) = 0 THEN (SELECT blocked_cnt FROM sub WHERE sub.pidd = s.pid UNION ALL SELECT 0 FROM sub WHERE NOT EXISTS (SELECT FROM sub WHERE sub.pidd = s.pid))
                ELSE count(p.waiter)
            END), 0) AS cnt
    FROM pairs AS p WHERE p.locker = s.pid::int)AS blocked
    UNION ALL
    /* add details of lockers */
    SELECT
        DISTINCT(p.locker::int),
        '{}'::text[] AS blocked_by,
        (clock_timestamp() - a.xact_start)::interval(0) AS ts_age,
        (clock_timestamp() - a.state_change)::interval(0) AS change_age,
        NULL::interval AS blocking_age,
        a.backend_xid AS xid,
        a.backend_xmin AS xmin,
        replace(state, 'idle in transaction', 'idletx') AS state,
        a.datname,
        a.usename,
        a.client_addr,
        a.application_name,
        a.wait_event_type || ':' || a.wait_event AS wait,
        blocked.cnt AS blocked_cnt,
        true AS root,
        trim(trailing ';' from format(
            '%s %s%s',
            '[' || p.locker::text || ']',
            coalesce(repeat('.', 0), '') || ' ',
            left(query, 1000)
        )) AS query
    FROM pairs AS p
    JOIN activity AS a ON p.locker::int = a.pid,
    LATERAL (SELECT count(t.prev_pid) AS cnt FROM tree t WHERE t.prev_pid = p.locker::text) blocked
    WHERE NOT EXISTS (SELECT FROM stat AS s WHERE s.pid::int = p.locker)
    /* every other pids not in locker and waiter pairs */
    UNION ALL
    SELECT
        DISTINCT(a.pid),
        '{}'::text[] AS blocked_by,
        (clock_timestamp() - a.xact_start)::interval(0) AS ts_age,
        (clock_timestamp() - a.state_change)::interval(0) AS change_age,
        NULL::interval AS blocking_age,
        a.backend_xid AS xid,
        a.backend_xmin AS xmin,
        replace(state, 'idle in transaction', 'idletx') AS state,
        a.datname,
        a.usename,
        a.client_addr,
        a.application_name,
        a.wait_event_type || ':' || a.wait_event AS wait,
        0::int AS blocked_cnt,
        true AS root,
        trim(trailing ';' from format(
            '%s %s%s',
            '[' || a.pid::text || ']',
            coalesce(repeat('.', 0), '') || ' ',
            left(query, 1000)
        )) AS query
    FROM activity AS a WHERE NOT EXISTS (SELECT FROM pairs AS p WHERE p.locker = a.pid OR p.waiter = a.pid)
    ORDER by blocked_cnt DESC, root, pid
)
SELECT
    pid,
    blocked_by,
    ts_age,
    change_age,
    blocking_age,
    xid,
    xmin,
    state,
    datname,
    usename,
    client_addr,
    application_name,
    wait,
    blocked_cnt,
    root,
    query
FROM result;
Enter fullscreen mode Exit fullscreen mode

Example output:

  pid  |    blocked_by     |  ts_age  | change_age | blocking_age |  xid   |  xmin  | state  | datname | usename | client_addr | application_name |             wait             | blocked_cnt | root |                         query
-------+-------------------+----------+------------+--------------+--------+--------+--------+---------+---------+-------------+------------------+------------------------------+-------------+------+-------------------------------------------------------
 40188 | {}                | 00:10:03 | 00:10:00   | <null>       |  87324 | <null> | idletx | bolaji  | bolaji  | <null>      | bolaji-psql      | Client:ClientRead            |           4 | t    | [40188]  update folder set id = id
 40224 | {40188:0}         | 00:09:57 | 00:09:56   | 00:09:56     |  87325 |  87324 | active | bolaji  | bolaji  | <null>      | bolaji-psql      | Lock:transactionid           |           3 | f    | [40224] .. delete from folder
 40245 | {40188:0,40224:0} | 00:09:51 | 00:09:50   | 00:09:50     |  87326 |  87324 | active | bolaji  | bolaji  | <null>      | bolaji-psql      | Lock:relation                |           2 | f    | [40245] ... alter table folder add column data1 jsonb
 40258 | {}                | 00:09:41 | 00:09:40   | <null>       |  87327 | <null> | idletx | bolaji  | bolaji  | <null>      | bolaji-psql      | Client:ClientRead            |           1 | t    | [40258]  drop table sample
 40222 | {40245:2}         | 00:09:45 | 00:09:45   | 00:09:45     | <null> |  87324 | active | bolaji  | bolaji  | <null>      | bolaji-psql      | Lock:relation                |           0 | f    | [40222] .. select * from folder where id = 1
 40253 | {40245:2}         | 00:09:47 | 00:09:47   | 00:09:47     | <null> |  87324 | active | bolaji  | bolaji  | <null>      | bolaji-psql      | Lock:relation                |           0 | f    | [40253] .. select * from folder
 40260 | {40258:0}         | 00:09:38 | 00:09:38   | 00:09:38     | <null> |  87324 | active | bolaji  | bolaji  | <null>      | bolaji-psql      | Lock:relation                |           0 | f    | [40260] .. select * from sample
  3640 | {}                | <null>   | <null>     | <null>       | <null> | <null> | <null> | <null>  | <null>  | <null>      |                  | Activity:CheckpointerMain    |           0 | t    | [3640]
  3641 | {}                | <null>   | <null>     | <null>       | <null> | <null> | <null> | <null>  | <null>  | <null>      |                  | Activity:BgWriterHibernate   |           0 | t    | [3641]
  3642 | {}                | <null>   | <null>     | <null>       | <null> | <null> | <null> | <null>  | <null>  | <null>      |                  | Activity:WalWriterMain       |           0 | t    | [3642]
  3643 | {}                | <null>   | <null>     | <null>       | <null> | <null> | <null> | <null>  | <null>  | <null>      |                  | Activity:AutoVacuumMain      |           0 | t    | [3643]
  3645 | {}                | <null>   | <null>     | <null>       | <null> | <null> | <null> | <null>  | bolaji  | <null>      |                  | Activity:LogicalLauncherMain |           0 | t    | [3645]
(12 rows)

Time: 4.372 ms
Enter fullscreen mode Exit fullscreen mode

A little explanation:

  • {40188:0} - translates to pid:lock_level
  • pid 40245 blocked by {40188:0,40224:0} - translates to pids 40188 on level 0 (the root), 40224 also level 0(the root) and 40245 on level 1.
  • [40224] .. delete from folder - translates to pid 40224 being on level 1 on the lock queue/tree behind the root.

I hope you find the above query useful for lock issue troubleshooting.

Discussion (0)