DEV Community 👩‍💻👨‍💻

Dmitry Romanoff
Dmitry Romanoff

Posted on

Troubleshooting High CPU Utilization in PostgreSQL Databases: A How-To Guide

Image description

The article was initially published in JFrog blog.

PostgreSQL is one of the most popular databases in the World.

Central processing unit (CPU) usage is one of the critical metrics to check for a PostgreSQL DB (database) instance. Checking CPU utilization allows for understanding if the DB is experiencing performance issues like inefficient SQL queries, lack of indexes, and contention.

After seeing a high CPU of the PostgreSQL DB instance, it’s essential to find the root cause of the issue. Queries can be poorly written, too frequent, or heavy.

This blog post describes useful SQL queries that can be run on the PostgreSQL DB instance to investigate the high CPU utilization. It will also help you understand what’s running in the database instance that impacts CPU usage.

SQL Query #1 – Connections summary

One of the patterns of PostgreSQL DB’s leading to high CPU utilization is a high number of active connections. The following SQL query lists the:

Total number of connections
Number of non-idle connections
Number of maximum available connections
Connections utilization percentage

select 
    A.total_connections, 
    A.non_idle_connections, 
    B.max_connections,
    round((100 * A.total_connections::numeric / B.max_connections::numeric), 2) connections_utilization_pctg
from
  (select count(1) as total_connections, sum(case when state!='idle' then 1 else 0 end) as non_idle_connections from pg_stat_activity) A,
  (select setting as max_connections from pg_settings where name='max_connections') B;
Enter fullscreen mode Exit fullscreen mode

Here’s an example of the SQL output:

 total_connections | non_idle_connections | max_connections | connections_utilization_pctg 
-------------------+----------------------+-----------------+------------------------------
              3457 |                    3 | 9057            |                        38.17
Enter fullscreen mode Exit fullscreen mode

In the above output example, the PostgreSQL DB instance currently has 3,457 connections, and the maximum number of concurrent connections to the DB server is 9,057, which means 38.17% of the connection slots are occupied. The number of non-idle connections is 3.

What’s the recommendation for SQL Query #1?

The recommendation for SQL Query one is to examine running sessions on the PostgreSQL DB instance, trying to identify and analyze using EXPLAIN long-running, badly written, too-frequent, queries. In case the number of active connections is more than one per CPU core it’s recommended to check and tune the application(s) working with the DB.

To check the maximum number of connections, run the following SQL command:

show max_connections;
Enter fullscreen mode Exit fullscreen mode

Typically, the default is 100 connections.

In case the number of connections approaches the maximum number of connections value set for the given PostgreSQL instance, it’s recommended to analyze application activity and/or application logic, reduce the number of connections arriving at the DB, tune the max_connections parameter, or scale up DB instance.

In addition, I recommend using and/or tuning the connection pool: an external one like PgBouncer, or an internal one like HikariCP for Java or PGX for Go.

The parameter max_connections defines the maximum number of concurrent connections to the PostgreSQL DB Server. Changing this parameter requires that the PostgreSQL DB instance be rebooted after a change so that the new value can take effect. For this instance, it’s recommended to set the max_connections PostgreSQL parameter to the maximum number of connections expected at peak load.

On the other hand, the max_connections parameter needs to be aligned with the available resources of the PostgreSQL DB machine. It should be tuned carefully to avoid system out-of-memory issues, due to each PostgreSQL DB connection allocating a chunk of the shared_buffer memory, as well as non-shared memory.

To tune parameter max_connections, and other key PostgreSQL DB instance parameters I recommend using free online tools, like, for example, PostgreSQL configuration builder at https://www.pgconfig.org.

SQL Query #2 – Distribution of non-idle connections per database

Use the below query to check the distribution of non-idle connections number per database, sorted in descending order:

select 
datname as db_name, 
count(1) as num_non_idle_connections 
from pg_stat_activity 
where state!='idle' 
group by 1 
order by 2 desc;
Enter fullscreen mode Exit fullscreen mode

Here’s an example of the SQL output:

          db_name           | num_non_idle_connections 
----------------------------+--------------------------
 my_db_1                    |                      133
 my_db_2                    |                        6
 my_db_3                    |                        3
Enter fullscreen mode Exit fullscreen mode

In the above example, the top non-idle sessions are running on the DB my_db_1.

What’s the recommendation for SQL Query #2?

The recommendation in such a case would be to examine running sessions of the DB my_db_1 on the PostgreSQL DB instance, trying to identify long-running, badly written, too-frequent queries.

SQL Query #3 – Distribution of non-idle connections per database and per query

Check the distribution of non-idle connections per database and per query, sorted in descending order as seen in the example below:

select 
datname as db_name, 
query, 
count(1) as num_non_idle_connections 
from pg_stat_activity 
where state!='idle' 
group by 1, 2 
order by 3 desc;
Enter fullscreen mode Exit fullscreen mode

The output text in the resultset may look too long. In such cases a modified version of the query can be helpful as seen in the example below:

select 
datname as db_name, 
substr(query, 1, 200) short_query, 
count(1) as num_non_idle_connections 
from pg_stat_activity 
where state!='idle' 
group by 1, 2 
order by 3 desc;
Enter fullscreen mode Exit fullscreen mode

Here’s an example of the SQL output:

db_name  |              short_query              | num_non_idle_connections 
----------+---------------------------------------+--------------------------
 db_1     | select * from table_1                |                       40
 db_1     | select * from table_2                |                        1
Enter fullscreen mode Exit fullscreen mode

In this example, the application/client connected to db_1 is running two PostgreSQL queries. The upper query, which selects data from the table_1 has 40 non-idle connections.

What’s the recommendation for SQL Query #3?

The recommendation in such a case would be to examine the SQL queries having the top non-idle connections. It happens that a high number of non-idle connections may appear to indicate ineffective, not scalable architecture or workload, not matching the system resources.

SQL Query #4 – Non-idle sessions detailed

List non-idle PostgreSQL sessions that take more than five seconds, sorted by the runtime in descending order like in the example below:

select 
    now()-query_start as runtime, 
    pid as process_id, 
    datname as db_name, 
    client_addr,
    client_hostname,
    query
from pg_stat_activity
where state!='idle'
and now() - query_start > '5 seconds'::interval
order by 1 desc;
Enter fullscreen mode Exit fullscreen mode

In case the resultset looks too wide, the modified version of the query can be run like in the example below:

select 
    now()-query_start as runtime, 
    pid as process_id, 
    datname as db_name, 
    client_addr,
    client_hostname,
    substr(query, 1, 200) the_query
from pg_stat_activity
where state!='idle'
and now() - query_start > '5 seconds'::interval
order by 1 desc;
Enter fullscreen mode Exit fullscreen mode

Here’s an example of the SQL output:

 runtime  | process_id | db_name  |  client_addr   | client_hostname | query              
----------+------------+----------+----------------+-----------------+----------------
 00:12:34 |       7770 | db_1     | 192.168.12.208 |                 | select …
Enter fullscreen mode Exit fullscreen mode

For each long-running query, the resultset contains the corresponding runtime, process id, database name, client address, and hostname.

What’s the recommendation for SQL Query #4?

In some scenarios, long-running queries can cause high CPU utilization. In these instances, the queries obtained in the resultset should be analyzed and appropriately tuned.

In case the query runs too long, causing a high load on the DB CPU and other resources, you may want to terminate it explicitly. To terminate a PostgreSQL DB session by run the following command:

select pg_terminate_backend(<process_id>);
Enter fullscreen mode Exit fullscreen mode

SQL Query #5 – Running frequent SQL queries

The root cause of high CPU utilization in PostgreSQL databases may not be a necessary long-running query. Quick, but too frequent queries running hundreds of times per second can cause high CPU utilization too.

To find the top frequent PostgreSQL queries run the following SQL query as shown in the example below:

with
a as (select dbid, queryid, query, calls s from pg_stat_statements),
b as (select dbid, queryid, query, calls s from pg_stat_statements, pg_sleep(1))
select
        pd.datname as db_name, 
        substr(a.query, 1, 400) as the_query, 
        sum(b.s-a.s) as runs_per_second
from a, b, pg_database pd
where 
  a.dbid= b.dbid 
and 
  a.queryid = b.queryid 
and 
  pd.oid=a.dbid
group by 1, 2
order by 3 desc;
Enter fullscreen mode Exit fullscreen mode

Here’s an example of the output:

             db_name              |            the_query             |      runs_per_second 
----------------------------------+------------------------------------------------------------
               db_1               | select ... from ... where ...    |               10
               db_1               | insert into ... values (...)     |               2 
Enter fullscreen mode Exit fullscreen mode

This resultset includes a list of queries and corresponding frequency: how many times each query runs per second.

What’s the recommendation for SQL Query #5?

Examine SQL queries and corresponding application logic. Try to improve the application architecture to use caching mechanisms. This will help to prevent running SQL queries too frequently on the PostgreSQL Database Server.

SQL Query #6 – PostgreSQL Database CPU distribution per database, and per query

This query checks how much each query in each database uses the CPU. It provides a resultset sorted in descending order by the most CPU-intensive queries.

For PostgreSQL versions 12 and earlier:

SELECT 
        pss.userid,
        pss.dbid,
        pd.datname as db_name,
        round(pss.total_time::numeric, 2) as total_time, 
        pss.calls, 
        round(pss.mean_time::numeric, 2) as mean, 
        round((100 * pss.total_time / sum(pss.total_time::numeric) OVER ())::numeric, 2) as cpu_portion_pctg,
        pss.query
FROM pg_stat_statements pss, pg_database pd
WHERE pd.oid=pss.dbid
ORDER BY pss.total_time 
DESC LIMIT 30;
Enter fullscreen mode Exit fullscreen mode

For PostgreSQL versions starting 13:

SELECT 
        pss.userid,
        pss.dbid,
        pd.datname as db_name,
        round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as total_time, 
        pss.calls, 
        round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 2) as mean, 
        round((100 * (pss.total_exec_time + pss.total_plan_time) / sum((pss.total_exec_time + pss.total_plan_time)::numeric) OVER ())::numeric, 2) as cpu_portion_pctg,
        pss.query
FROM pg_stat_statements pss, pg_database pd 
WHERE pd.oid=pss.dbid
ORDER BY (pss.total_exec_time + pss.total_plan_time)
DESC LIMIT 30;
Enter fullscreen mode Exit fullscreen mode

In case the output text is too long, SQL query #6 can be modified as follows:

For PostgreSQL versions 12 and earlier:

SELECT 
        pss.userid,
        pss.dbid,
        pd.datname as db_name,
        round(pss.total_time::numeric, 2) as total_time, 
        pss.calls, 
        round(pss.mean_time::numeric, 2) as mean, 
        round((100 * pss.total_time / sum(pss.total_time::numeric) OVER ())::numeric, 2) as cpu_portion_pctg,
        substr(pss.query, 1, 200) short_query
FROM pg_stat_statements pss, pg_database pd
WHERE pd.oid=pss.dbid
ORDER BY pss.total_time 
DESC LIMIT 30;
Enter fullscreen mode Exit fullscreen mode

For PostgreSQL versions starting 13:

SELECT 
        pss.userid,
        pss.dbid,
        pd.datname as db_name,
        round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as total_time, 
        pss.calls, 
        round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 2) as mean, 
        round((100 * (pss.total_exec_time + pss.total_plan_time) / sum((pss.total_exec_time + pss.total_plan_time)::numeric) OVER ())::numeric, 2) as cpu_portion_pctg,
        substr(pss.query, 1, 200) short_query
FROM pg_stat_statements pss, pg_database pd 
WHERE pd.oid=pss.dbid
ORDER BY (pss.total_exec_time + pss.total_plan_time)
DESC LIMIT 30;
Enter fullscreen mode Exit fullscreen mode

Here’s an example of the SQL output:

userid | dbid  | db_name | total_time  |  calls   |  mean   | cpu_portion_pctg |  short_query
 16409 | 16410 |  db_1   | 27349172.12 |  3905898 |    7.00 |           25.15  |  select ... from table_1
 16409 | 16410 |  db_1   |   391755.00 |      105 | 3731.00 |           16.76  |  select ... from table_2
 ...
Enter fullscreen mode Exit fullscreen mode

From the output example, you can see:

  • The first query (i.e., select … from table_1) takes the most portion of the CPU, because of a high number of calls. I recommend looking at how often the application is running queries connected to this PostgreSQL DB.
  • Examine the second query (i.e., select … from table_2) too, because the high mean time spent executing the statement is 3731 ms, which exceeds three seconds.

What’s the recommendation for SQL Query #6?

Check SQL queries that use a lot of CPU or time. Also, look for queries with a high mean time and/or a number of calls.

In case the output of the queries indicates “insufficient privileges” the following command should be run as seen in the example below:

GRANT pg_read_all_stats TO <db_user>;
Enter fullscreen mode Exit fullscreen mode

SQL queries #5 and #6 are based on the pg_stat_statements extension of PostgreSQL. The pg_stat_statements extension allows tracking statistics for top/all SQL statements running on the PostgreSQL DB instance. Before running the queries, pg_stat_statements should be enabled for the PostgreSQL DB instance.

To enable pg_stat_statements set the PostgreSQL server configuration parameter pg_stat_statements.track to TOP, and run the following command connected to the DB PostgreSQL as administrator:

create extension pg_stat_statements;
Enter fullscreen mode Exit fullscreen mode

The pg_stat_statements.track configuration parameter controls which statements are counted by the module. Specify top to track top-level statements (those issued directly by clients), all to also track nested statements (such as statements invoked within functions), or none to disable statement statistics collection. The default value is top. Only superusers can change this setting.

To verify that the pg_stat_statements extension is enabled, the following command should return some positive number of records:

select count(1) from pg_stat_statements;
Enter fullscreen mode Exit fullscreen mode

To reset all statistics gathered by pg_stat_statements run the following command:

select pg_stat_statements_reset();
Enter fullscreen mode Exit fullscreen mode

To save PostgreSQL query results to a file, use the following approach:

postgres=> \o some_output_file.trc
postgres=> select * from some_table_1;
postgres=> select * from some_table_2;
postgres=> select * from some_table_3;
postgres=> \q
dmitryr@dmitryr-mac my_postgres % ls -rtogla some_output_file.trc
-rw-r--r--  1   59 Oct 24 23:32 some_output_file.trc
dmitryr@dmitryr-mac my_postgres %
Enter fullscreen mode Exit fullscreen mode

SQL Query #7 – Check PostgreSQL DB tables statistics

Outdated PostgreSQL statistics can be another root cause for high CPU utilization. When statistical data isn’t updated, the PostgreSQL query planner may generate non-efficient execution plans for queries, which will lead to a bad performance of the entire PostgreSQL DB Server.

To check the last date and time the statistics were updated for each table in the PostgreSQL DB Server for a specific DB, connect to the DB and run the following query:

 select
  schemaname,
  relname,
  DATE_TRUNC('minute', last_analyze) last_analyze,
  DATE_TRUNC('minute', last_autoanalyze) last_autoanalyze
from
  pg_stat_all_tables
where
  schemaname = 'public'
order by
  last_analyze desc NULLS FIRST,
  last_autoanalyze desc NULLS FIRST;
Enter fullscreen mode Exit fullscreen mode

Here’s an example of the SQL output:

 schemaname |                         relname                         |      last_analyze      |    last_autoanalyze    
------------+---------------------------------------------------------+------------------------+------------------------
 public     | my_table_1                                              | 2022-11-05 04:05:00+00 | 
 public     | my_table_3                                              | 2022-11-05 04:05:00+00 | 
 public     | my_table_2                                              | 2022-11-05 04:05:00+00 | 
Enter fullscreen mode Exit fullscreen mode

What’s the recommendation for SQL Query #7?

Ensure tables are analyzed regularly.

To collect statistics manually for a specific table and its associated indexes run the command:

ANALYZE <table_name>;
Enter fullscreen mode Exit fullscreen mode

SQL Query #8 – Check PostgreSQL database bloat

In cases of intensive data updates, both with frequent UPDATE and with INSERT / DELETE operations, PostgreSQL tables and their indices become bloated. Bloat refers to disk space that was allocated by a table or index and is now available for reuse by the database, but has not been reclaimed. Because of this bloat, the performance of the PostgreSQL DB Server is degraded, which can lead to high CPU utilization scenarios.

Under normal PostgreSQL operations, tuples that are deleted or stale because of an update aren’t physically removed from the table— they’re stored there until the VACUUM command is issued. VACUUM releases the space occupied by “dead” tuples. Thus, it’s necessary to perform a VACUUM periodically, especially for tables that change often.

To check information about dead tuples, and when vacuum / autovacuum was run for each table in the PostgreSQL DB Server for specific DB, connect to the DB and run the following query:

select 
  schemaname, 
  relname, 
  n_tup_ins, 
  n_tup_upd, 
  n_tup_del, 
  n_live_tup, 
  n_dead_tup, 
  DATE_TRUNC('minute', last_vacuum) last_vacuum, 
  DATE_TRUNC('minute', last_autovacuum) last_autovacuum
from 
  pg_stat_all_tables 
where 
  schemaname = 'public'
order by 
  n_dead_tup desc;
Enter fullscreen mode Exit fullscreen mode

Here’s an example of the output:

 schemaname | relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |      last_vacuum       |    last_autovacuum     

 public     | table_1 |  30237555 |  41784024 |     26858 |   30184398 |     142226 | 2022-11-05 04:00:00+00 | 
 public     | table_4 |  26204826 |         0 |  23898236 |    3628982 |      11688 | 2022-11-05 04:03:00+00 | 2022-11-04 16:13:00+00
 public     | table_2 |  25934622 |         0 |  23741303 |    3447190 |      11647 | 2022-11-05 04:03:00+00 | 2022-11-04 00:15:00+00
 public     | table_3 |    577825 |   4573009 |         0 |    6476040 |      11132 | 2022-11-05 04:04:00+00 | 
Enter fullscreen mode Exit fullscreen mode

What’s the recommendation for SQL Query #8?

Ensure tables are vacuumed regularly.

To run VACUUM (regular, not FULL) for a specific table and all its associated indexes run the command:

VACUUM <table_name>;
Enter fullscreen mode Exit fullscreen mode

VACUUM can be run by an autovacuum process together with statistics collection. Autovacuum is a background process that executes VACUUM and ANALYZE commands automatically.

SQL Query #9 – Check PostgreSQL database tables statistics and bloat

You can also combine SQL queries #7 and #8 into a single SQL query.

select 
  schemaname, 
  relname, 
  n_tup_ins, 
  n_tup_upd, 
  n_tup_del, 
  n_live_tup, 
  n_dead_tup, 
  last_vacuum, 
  last_autovacuum, 
  last_analyze, 
  last_autoanalyze 
from 
  pg_stat_all_tables 
where 
  schemaname = 'public'
order by 
  n_dead_tup desc;
Enter fullscreen mode Exit fullscreen mode

What’s the recommendation for SQL Query #9?

Get a list of tables that were either never analyzed or vacuumed, or that were analyzed a long time ago, or that have had a lot of changes since the last time DB statistics were collected and vacuum run. Tune the autovacuum PostgreSQL process to ensure the more frequently a table or its indexes are getting changes, the more frequently vacuum and analyze will be performed.

To collect DB statistics and to do vacuum (regular, not FULL) on all the objects of all the DBs of the PostgreSQL DB instance run the following command:

vacuumdb -h <db_host> -p <db_port> -U <db_user> -j 4 -z -v -a
Enter fullscreen mode Exit fullscreen mode

To collect DB statistics and to do vacuum (regular, not FULL) on all the objects of some specific database of the PostgreSQL DB instance run the following command:

vacuumdb -h <db_host> -p <db_port> -U <db_user> -j 4 -z -v <db_name>
Enter fullscreen mode Exit fullscreen mode

Conclusion

In this blog post, I examined the most comprehensive and effective troubleshooting practices for high CPU utilization issues in PostgreSQL Databases. I also provided an in-depth classification of the different types of SQL queries that can be used to troubleshoot high CPU utilization, as well as examples of how to use those queries. This knowledge will undoubtedly become part of the playbook for any production mission-critical PostgreSQL DB environment.

Top comments (0)

Create an Account!

👀 Just want to lurk?

That's fine, you can still create an account and turn on features like 🌚 dark mode.