loading...

Cheap count(*) in Postgres

jplindstrom profile image Johan Lindstrom ・1 min read

Running select count(*) from mytable in Postgres can be expensive for large tables. Beacause of its MVCC concurrency model it must do a lot of work to get an exact figure.

If we're not careful the database can start using a lot of CPU, disk IO and cache memory. Not great in production where the database presumably is pretty busy doing other important work...

But if we're new to a database and just looking around, we don't care about the exact number. We just need to know if it would be dangerous to run queries on the table to start investigating the broken data after that nasty bug.

There are other ways to do this, but here's a simple and easy-to-remember trick: use the query plan from explain.

explain select count(*) from my_table;
                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Aggregate  (cost=1901498.40..1901498.41 rows=1 width=8)
   ->  Seq Scan on my_table  (cost=0.00..1815135.52 rows=34545152 width=0)
(2 rows)

See that rows=34545152? That's the rough number of rows in the table.

35 million. Careful with those full table scans...

Posted on Oct 23 '18 by:

Discussion

markdown guide
 

I don't know Postgresql but explain is reading the information from the statistic and the statistic is not calculated automatically.

Other than, it sounds a good trick.