DEV Community

Cover image for PostgreSQL DB Indexes Maintenance Script
Dmitry Romanoff
Dmitry Romanoff

Posted on

PostgreSQL DB Indexes Maintenance Script

Indexes are needed to optimize query performance in a relational database such as PostgreSQL.

Indexes allow faster data retrieval, more efficient query execution, reduced I/O operations, efficient sorting, faster join operations, constraint enforcement, avoiding full table scans, optimizing aggregation operations, and improve overall database performance.

Careful design and management of indexes are required to achieve the right balance between performance gains and potential drawbacks.

Indexes come with some trade-offs. It includes space overhead, insert/update/delete overhead, and the need to be maintained over time to ensure they remain effective.

When database indexes remain not maintained for a long time, they may cause:

  • growth in size
  • bad clustering factor
  • fragmentation and spread among MBs/GBs of storage space
  • poor performance; the index becomes not effective
  • non-efficient extra disk utilization which translated to additional cost
  • negative experience when storage associated with DB is growing up all the time, and reclaiming it, is impossible

Regular vacuum maintenance is not changing or rebuilding the index itself, it only marks dead tuples.

This blog post demonstrates the PostgreSQL DB Indexes Maintenance Script to address this challenge. It will recreate DB indexes online, concurrently. The PostgreSQL DB Indexes Maintenance Script can be run online and doesn’t require downtime. It can be implemented as a background process, seamlessly and transparently handling indexes.

PostgreSQL DB Indexes Maintenance Script allows reclaiming storage space.

The script works by connecting to some PostgreSQL DB Server according to its input parameters.

It retrieves a list of databases in the PostgreSQL DB Server and iterates all the databases in the PostgreSQL DB Server connecting to each of them and getting a list of the indexes sorted by their sizes in descending order.

Then the script checks if an index is valid (not corrupted). In case an index is corrupted the script will be exited. Further, the script is checking the size of the index before recreating, recreates the index concurrently, and then checks if the index has been recreated and whether it is valid.

The script accumulates the storage reclaimed as a result of the index's recreation. The script assumes that each database is owned by the user.

I believe this script will be useful for DevOps Engineers, DBAs, SREs, IT specialists, and any engineer working with PostgreSQL DBs. It can be part of Production business-critical database-related systems.

#!/bin/bash

############################################################
#
# index_maintanance_procedure.sh
#
# This script performs index maintenance procedure
#
# The script works connecting to some PostgreSQL DB Server. 
#
# It gets a list of databases in the PostgreSQL DB Server
# and iterates all the databases in the PostgreSQL DB Server
# connecting to each of them and getting a list of the indexes
# sorted by their sizes in the descending order.
#
# Then the script checks if an index is valid (not corrupted).
# In case an index is corrupted the script will be exited.
# Next, the script is checking the size of the index before recreate,
# recreates the index concurrently, and then check 
# if the index has recreated and whether it is valid.
#
# The script accumulates the storage reclaimed as result of the indexes recreation.
#
# My script assumes that each database is owned by the user, 
# that is having the same name as the database.
#
# Date: 03-Jan-2023
#
# Author: Dmitry
#
############################################################

helpFunction()
{
   echo ""
   echo "Usage: $0 -h hostname -p port -x db_pattern -a offset -b limit"
   echo -e "\t-h Postgres hostname"
   echo -e "\t-p Postgers port"
   echo -e "\t-x Postgres db pattern"
   echo -e "\t-a Report offset starting from 0"
   echo -e "\t-b Report limit"
   echo -e " "
   echo -e "Example how to run: $0 -h <db_hostname> -p <db_port> -x % -a 0 -b 10 "
   echo -e " "
   exit 1 # Exit script after printing help
}

trim() {
    local var="$*"
    # remove leading whitespace characters
    var="${var#"${var%%[![:space:]]*}"}"
    # remove trailing whitespace characters
    var="${var%"${var##*[![:space:]]}"}"
    printf '%s' "$var"
}

while getopts "h:p:x:a:b:" opt
do
   case "$opt" in
      h ) inpHost="$OPTARG" ;;
      p ) inpPort="$OPTARG" ;;
      x ) inpDBPattern="$OPTARG" ;;
      a ) inpOffset="$OPTARG" ;;
      b ) inpLimit="$OPTARG" ;;
      ? ) helpFunction ;; # Print helpFunction in case parameter is non-existent
   esac
done

# Print helpFunction in case parameters are empty
if [ -z "$inpHost" ] || [ -z "$inpPort" ] || [ -z "$inpDBPattern" ] || [ -z "$inpOffset" ] || [ -z "$inpLimit" ]
then
   echo "Some or all of the parameters are empty";
   helpFunction
fi

storage_reclaimed_accumulated=0

echo " "
echo "Input parameters:"
echo "---------------- "
echo "inpHost=$inpHost"
echo "inpPort=$inpPort"
echo "inpDBPattern=$inpDBPattern"
echo "inpOffset=$inpOffset"
echo "inpLimit=$inpLimit"

num_of_dbs_to_process=$(psql -h $inpHost -p $inpPort -t -c "select count(M.*) n from (SELECT pg_database.datname as db_name, round(pg_database_size(pg_database.datname)/1024/1024) AS size_in_mb FROM pg_database WHERE pg_database.datname like '${inpDBPattern}' and pg_database.datname not in ('postgres', 'template0', 'template1', 'rdsadmin') ORDER by pg_database_size(pg_database.datname) DESC offset ${inpOffset} limit ${inpLimit} ) M;" postgres)
num_of_dbs_to_process=`trim $num_of_dbs_to_process`

echo " "
echo "Number of DBs in the PostgreSQL DB instance $inpHost - $inpPort: ${num_of_dbs_to_process}"
echo " "

export the_yyyymmdd=$(date '+%Y%m%d')
export hh24miss=$(date '+%H%M%S')

report_name="index_maintanance_procedure"
output_name="/tmp/temp_report_${report_name}_${the_yyyymmdd}_${hh24miss}.tmp"

echo "DBs to process:"
echo "---------------"

psql -h $inpHost -p $inpPort -c "select M.* from (SELECT row_number() over(ORDER by pg_database_size(pg_database.datname) DESC) n, pg_database.datname as db_name, round(pg_database_size(pg_database.datname)/1024/1024) AS db_size_mb FROM pg_database WHERE pg_database.datname like '${inpDBPattern}' and pg_database.datname not in ('postgres', 'template0', 'template1', 'rdsadmin') ORDER by pg_database_size(pg_database.datname) DESC offset ${inpOffset} limit ${inpLimit} ) M;" postgres

idx=1

for DBs_to_process in $(psql -h $inpHost -p $inpPort -t -c "select M.db_name from (SELECT row_number() over(ORDER by pg_database_size(pg_database.datname) DESC) n, pg_database.datname as db_name, round(pg_database_size(pg_database.datname)/1024/1024) AS size_in_mb FROM pg_database WHERE pg_database.datname like '${inpDBPattern}' and pg_database.datname not in ('postgres', 'template0', 'template1', 'rdsadmin') ORDER by pg_database_size(pg_database.datname) DESC offset ${inpOffset} limit ${inpLimit} ) M;" postgres); 
do

echo " "

export the_db=$DBs_to_process
echo "($idx:$num_of_dbs_to_process) the_db: $the_db"

num_of_invalid_indexes=$(psql -h $inpHost -p $inpPort -U $the_db -d $the_db -t -c "SELECT count(relname) n_of_invalid_indexes FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid;")
num_of_invalid_indexes=`trim $num_of_invalid_indexes`
echo "Check invalid indexes: $num_of_invalid_indexes"
echo " "

zero=0;

if [[ $num_of_invalid_indexes -ne $zero ]]; then
  echo "Invalid Index Found!"
  psql -h $inpHost -p $inpPort -U $the_db -d $the_db -t -c "SELECT index_name, relname n_of_invalid_indexes FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid order by index_name, relname;"
  echo "Exit the process!"
  exit 2;
fi

echo "The biggest indexes according to size:"

psql -h $inpHost -p $inpPort -U $the_db -d $the_db -qX << EOF
SELECT 
  indexrelname "index_name",
  schemaname "schema_name",
  round(pg_relation_size(indexrelid)/1024/1024) "index_size_mb",
  i.relname "table_name",
  round(pg_relation_size(relid)/1024/1024) as "table_size_mb",
  reltuples::bigint "estimated_table_row_count"
 FROM pg_stat_all_indexes i 
 JOIN pg_class c ON i.relid=c.oid
 WHERE round(pg_indexes_size(relid)/1024/1024) > 0
   AND schemaname not in ('pg_catalog', 'pg_toast')
 ORDER BY pg_relation_size(indexrelid) DESC;
EOF

echo "recreate indexes"
echo " "

for INDEXEs_to_process in $(psql -h $inpHost -p $inpPort -U $the_db -d $the_db -t -c "SELECT indexrelname FROM pg_stat_all_indexes i JOIN pg_class c ON i.relid=c.oid WHERE round(pg_relation_size(indexrelid)/1024/1024) >0 AND schemaname not in ('pg_catalog', 'pg_toast') ORDER BY pg_relation_size(indexrelid) DESC;")
do

echo "========================================================"
echo "Index: $INDEXEs_to_process"
echo " "

num_of_invalid_indexes=$(psql -h $inpHost -p $inpPort -U $the_db -d $the_db -t -c "SELECT count(relname) n_of_invalid_indexes FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid;")
num_of_invalid_indexes=`trim $num_of_invalid_indexes`
# echo "Check invalid indexes before recreate: $num_of_invalid_indexes"
# echo " "
zero=0;
if [[ $num_of_invalid_indexes -ne $zero ]]; then
  echo "Invalid Index Found!"
  psql -h $inpHost -p $inpPort -U $the_db -d $the_db -t -c "SELECT index_name, relname n_of_invalid_indexes FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid order by index_name, relname;" 
  echo "Exit the process!"
  exit 2;
fi

size_of_the_index_before_recreate=$(psql -h $inpHost -p $inpPort -U $the_db -d $the_db -t -c "SELECT round(pg_relation_size(indexrelid)/1024/1024) index_size_mb FROM pg_class, pg_index WHERE relname='$INDEXEs_to_process' and pg_index.indisvalid = true AND pg_index.indexrelid = pg_class.oid;")
size_of_the_index_before_recreate=`trim $size_of_the_index_before_recreate`

echo "Recreate index online"
psql -h $inpHost -p $inpPort -U $the_db -d $the_db -qX << EOF
reindex index concurrently "$INDEXEs_to_process";
EOF

chk_the_idx_after=$(psql -h $inpHost -p $inpPort -U $the_db -d $the_db -t -c "SELECT count(relname) n_of_invalid_indexes FROM pg_class, pg_index WHERE relname='$INDEXEs_to_process' and pg_index.indisvalid = true AND pg_index.indexrelid = pg_class.oid;")
chk_the_idx_after=`trim $chk_the_idx_after`
one=1;
if [[ $chk_the_idx_after -ne $one ]]; then
  echo "Index $INDEXEs_to_process Not Found!"
  echo "Exit the process!"
  exit 4;
fi
echo " "

num_of_invalid_indexes=$(psql -h $inpHost -p $inpPort -U $the_db -d $the_db -t -c "SELECT count(relname) n_of_invalid_indexes FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid;")
num_of_invalid_indexes=`trim $num_of_invalid_indexes`
# echo "Check invalid indexes after recreate: $num_of_invalid_indexes"
# echo " "
zero=0;
if [[ $num_of_invalid_indexes -ne $zero ]]; then
  echo "Invalid Index Found!"
  psql -h $inpHost -p $inpPort -U $the_db -d $the_db -t -c "SELECT index_name, relname n_of_invalid_indexes FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid order by index_name, relname;"
  echo "Exit the process!"
  exit 2;
fi

size_of_the_index_after_recreate=$(psql -h $inpHost -p $inpPort -U $the_db -d $the_db -t -c "SELECT round(pg_relation_size(indexrelid)/1024/1024) index_size_mb FROM pg_class, pg_index WHERE relname='$INDEXEs_to_process' and pg_index.indisvalid = true AND pg_index.indexrelid = pg_class.oid;")
size_of_the_index_after_recreate=`trim $size_of_the_index_after_recreate`

storage_reclaimed=$(( size_of_the_index_before_recreate - size_of_the_index_after_recreate ))
echo "Storage reclaimed (MB): $storage_reclaimed"
echo " "

storage_reclaimed_accumulated=$(( $storage_reclaimed_accumulated + $storage_reclaimed ))
echo "Storage reclaimed accumulated (MB): $storage_reclaimed_accumulated"

done

idx=$(( idx + 1 ))

echo "::::::::::::::::::::::::::::::::::::::::::::"
done

echo " "
echo "Total Storage reclaimed accumulated (MB): $storage_reclaimed_accumulated"
echo " "
echo "End"
echo " "

Enter fullscreen mode Exit fullscreen mode

Example, how the PostgreSQL DB Indexes Maintenance Script:

dima@dima-mac shell % ./dmitry_index_maintanance_procedure.sh -h localhost -p 5432 -x % -a 0 -b 2

Input parameters:
---------------- 
inpHost=localhost
inpPort=5432
inpDBPattern=%
inpOffset=0
inpLimit=2

Number of DBs in the PostgreSQL DB instance localhost - 5432: 2

DBs to process:
---------------
 n |               db_name                | db_size_mb 
---+--------------------------------------+------------
 1 | my_db_1                              |       1674
 2 | my_db_2                              |         73
(2 rows)

(1:2) the_db: my_db_1
Check invalid indexes: 0

The biggest indexes according to size:
           index_name           | schema_name | index_size_mb |       table_name       | table_size_mb | estimated_table_row_count 
--------------------------------+-------------+---------------+------------------------+---------------+---------------------------
 my_table_1_idx1                | public      |           312 | my_table_1             |           316 |                   2150135
 my_table_1_idx2                | public      |            58 | my_table_1             |           288 |                    496207
 my_table_1_idx3                | public      |            56 | my_table_1             |            78 |                    714031
 my_table_2_idx1                | public      |            46 | my_table_2             |           316 |                   2150135
...


========================================================
Index: my_table_1_idx1

Recreate index online

Storage reclaimed (MB): 1

Storage reclaimed accumulated (MB): 1
========================================================
Index: my_table_1_idx2

Recreate index online

Storage reclaimed (MB): 1

Storage reclaimed accumulated (MB): 2
...

::::::::::::::::::::::::::::::::::::::::::::

Total Storage reclaimed accumulated (MB): 123

End

...
Enter fullscreen mode Exit fullscreen mode

Top comments (0)