Cost Estimation In PostgreSQL

#postgres #apacheage

Cost estimation in PostgreSQL refers to the process by which the query planner estimates how much time a query will take to execute. The planner uses both a startup cost and a total cost for each operation to determine the cheapest option for executing a query. The costs are calculated using both constants and metadata about the contents of the database, often referred to as "statistics". This is not an absolute performance indicator of the final execution but gives a relative idea of execution of queries in different ways.
The costs of a query are estimated by functions defined in costsize.c and all the operations of executor have their corresponding cost function.
There are three types of cost in PostgreSQL which are as follows:

1. Start-up Cost: It is cost expended before the first tuple is fetched.
2. Run Cost: It is cost to fetch all the tuples.
3. Total Cost: It is sum of both start-up cost and run cost.

Explain command shows both the start-up cost and total cost in the operation. The command can be used like this.

testdb=# EXPLAIN SELECT * FROM tbl;
                       QUERY PLAN                        
---------------------------------------------------------
 Seq Scan on tbl  (cost=0.00..145.00 rows=10000 width=8)
(1 row)

Cost Estimation in Sequential Scan

The cost of the sequential scan is estimated by cost_seqscan() function. For this, startup cost is 0 and the run cost is defined using the following formula:

'run cost' = 'cpu run cost' + 'dis run cost'
           = ('cpu tuple cost' + 'cpu operator cost')xNtupule +'seq page cost'xNpage

Here 'seq page cost', 'cpu tuple cost' and 'cpu operator cost' are set in the postgresql.conf file, and the default values are 1.0, 0.01, and 0.0025, respectively.
For Ntuple = 10000 and Npage = 45 the run cost comes out to be 170 which is same as total cost.

Cost Estimation in Sort

The sort path is used for a variety of sorting tasks, including ORDER BY, preprocessing merge join procedures, and more. The cost sort() method calculates the cost of sorting.
If all of the tuples that need to be sorted can be stored in work mem, the quicksort algorithm is used otherwise tuples are broken into multiple parts in temporary files and merge sort algorithm is used.
The start-up cost of the sort path is the cost of sorting the target tuples. The run cost of the sort path is the cost of reading the sorted tuples.
The following formula is used for the cost estimation:

'start up cost' = C + 'comparison cost' x Nsort x log2(Nsort)
'run cost' = 'cpu operator cost' x Nsort

where C is the total cost of index scan, 'comparison cost' = 2 x 'cpu operator cost'.

Latest comments (1)

jiwoh • Sep 12 '25 • Edited

Cost estimation in PostgreSQL predicts query execution cost using startup cost, run cost, and total cost. Startup cost is the cost before fetching the first tuple, run cost is for fetching all tuples, and total cost is their sum. Check now functions like cost_seqscan() and cost_sort() calculate these costs based on constants (e.g., cpu_tuple_cost, seq_page_cost) and database statistics. The EXPLAIN command shows these costs, helping the planner choose the most efficient query execution plan.