DEV Community

Dmitry Romanoff
Dmitry Romanoff

Posted on

Automating Database Health Checks with Python: A Step-by-Step Guide

In the world of database management, ensuring the health and performance of your databases is crucial. One effective way to achieve this is by automating your database health checks and generating reports. In this article, we’ll explore a Python script that performs these tasks using psycopg2, yaml, and jinja2 libraries. By the end, you'll have a clear understanding of how to automate database checks and produce HTML reports for easy visualization.

Overview of the Script
Our script performs the following tasks:

  • Load Configuration: Reads the database check configuration from a YAML file.
  • Connect to the Database: Establishes a connection to the PostgreSQL database.
  • Execute Queries: Runs queries specified in the configuration file and collects results.
  • Generate HTML Report: Creates an HTML report of the check results.

Let’s break down each part of the script.

1. Loading Configuration
The configuration for our database checks is stored in a YAML file. We use the pyyaml library to load this configuration.

import yaml

def load_config(config_file):
    with open(config_file, 'r') as file:
        return yaml.safe_load(file)
Enter fullscreen mode Exit fullscreen mode

This function reads the YAML file and loads it into a Python dictionary, which allows us to access our database checks configuration.

2. Connecting to the Database
We use psycopg2, a popular PostgreSQL adapter for Python, to connect to the database.

import psycopg2

def connect_db(hostname, port, dbname, username, password):
    try:
        conn = psycopg2.connect(
            host=hostname,
            port=port,
            dbname=dbname,
            user=username,
            password=password
        )
        return conn
    except Exception as e:
        print(f"Error connecting to database: {e}")
        raise
Enter fullscreen mode Exit fullscreen mode

This function attempts to connect to the database with the provided credentials and handles any connection errors that may arise.

3. Executing Queries
Once connected, we execute queries defined in the configuration file and gather the results.

def execute_queries(conn, queries):
    results = []
    try:
        with conn.cursor() as cursor:
            for check in queries:
                description = list(check.keys())[0]
                query = check[description]['query']
                cursor.execute(query)
                result = cursor.fetchall()
                columns = [desc[0] for desc in cursor.description]  # Get column names
                results.append({
                    'description': check[description]['description'],
                    'query': query,
                    'columns': columns,
                    'result': result
                })
    except Exception as e:
        print(f"Error executing queries: {e}")
        raise
    return results
Enter fullscreen mode Exit fullscreen mode

In this function, we:

  • Iterate over each query in the configuration.
  • Execute the query and fetch the results.
  • Extract column names and store everything in a results list.

4. Generating HTML Report
The final step is to generate an HTML report using the jinja2 templating engine.

from jinja2 import Template

def generate_html_report(results, output_file):
    template = Template("""
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>Database Check Results</title>
        <style>
            table {
                width: 100%;
                border-collapse: collapse;
            }
            table, th, td {
                border: 1px solid black;
            }
            th, td {
                padding: 8px;
                text-align: left;
            }
            th {
                background-color: #f2f2f2;
            }
            pre {
                white-space: pre-wrap; /* Allows wrapping of long queries */
            }
        </style>
    </head>
    <body>
        <h1>Database Check Results</h1>
        {% for check in results %}
        <h2>{{ check.description }}</h2>
        <pre>Query: {{ check.query }}</pre>
        <h3>Result:</h3>
        <table>
            <thead>
                <tr>
                    {% for column in check.columns %}
                    <th>{{ column }}</th>
                    {% endfor %}
                </tr>
            </thead>
            <tbody>
                {% for row in check.result %}
                <tr>
                    {% for value in row %}
                    <td>{{ value }}</td>
                    {% endfor %}
                </tr>
                {% endfor %}
            </tbody>
        </table>
        {% endfor %}
    </body>
    </html>
    """)
    html_content = template.render(results=results)
    with open(output_file, 'w') as file:
        file.write(html_content)
Enter fullscreen mode Exit fullscreen mode

This function uses a Jinja2 template to create an HTML file. It includes:

  • A header section with the title and styles.
  • A loop that iterates over each database check result and formats it into an HTML table.

Main Function
Finally, we tie everything together in the main function.

def main():
    # Database parameters
    db_hostname = 'localhost'
    db_port = 5439
    db_name = 'my_database'
    db_username = 'my_user'
    db_pwd = 'my_password'

# File paths
    config_file = 'dbcheck_config.yaml'
    output_file = 'result.html'
    # Load configuration
    config = load_config(config_file)
    dbchecks = config['dbchecks']
    # Connect to the database
    conn = connect_db(db_hostname, db_port, db_name, db_username, db_pwd)
    try:
        # Execute queries and get results
        results = execute_queries(conn, dbchecks)
        # Generate HTML report
        generate_html_report(results, output_file)
    finally:
        # Close the database connection
        conn.close()
if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

In the main function, we:

  • Define database connection parameters and file paths.
  • Load the configuration and connect to the database.
  • Execute the queries and generate the HTML report.
  • Ensure the database connection is closed properly.

dbcheck_config.yaml

dbchecks:

  - dbcheck_1:
      description: "Check the PostgreSQL DB version"
      query: 
             "select version()"

  - dbcheck_2:
      description: "Check DB connections"
      query: 
             "select 
                A.total_connections, 
                A.active_connections,
                B.max_connections,
                round((100 * A.total_connections::numeric / B.max_connections::numeric), 2) connections_utilization_pctg
              from
                (select count(1) as total_connections, sum(case when state='active' then 1 else 0 end) as active_connections from pg_stat_activity) A,
                (select setting as max_connections from pg_settings where name='max_connections') B"

  - dbcheck_3:
      description: "Distribution of active connections per DB"
      query:  
             "select 
                datname as db_name, 
                count(1) as num_of_active_connections 
              from pg_stat_activity 
              where state='active' 
              group by 1 
              order by 2 desc"

  - dbcheck_4:
      description: "Distribution of active connections per database and per query"
      query:  
             "select 
                datname as db_name, 
                substr(query, 1, 200) short_query, 
                count(1) as num_active_connections 
              from pg_stat_activity 
              where state='active' 
              group by 1, 2 
              order by 3 desc"

  - dbcheck_5:
      description: "Active sessions detailed running more than 5 seconds"
      query:  
             "select
                now()-query_start as runtime,
                pid as process_id, 
                datname as db_name,
                client_addr,
                client_hostname,
                substr(query, 1, 200) the_query
              from pg_stat_activity
              where state='active'
              and now() - query_start > '5 seconds'::interval
              order by 1 desc"

  - dbcheck_6:
      description: "Running frequent PostgreSQL queries"
      query:  
             "WITH a AS (
                    SELECT dbid, queryid, query, calls AS s
                    FROM pg_stat_statements
                ),
                b AS (
                    SELECT dbid, queryid, query, calls AS s
                    FROM pg_stat_statements, pg_sleep(1)
                )
                SELECT
                    pd.datname AS db_name,
                    substr(a.query, 1, 400) AS the_query,
                    SUM(b.s - a.s) AS runs_per_second
                FROM a
                JOIN b ON a.dbid = b.dbid AND a.queryid = b.queryid
                JOIN pg_database pd ON pd.oid = a.dbid
                GROUP BY 1, 2
                HAVING SUM(b.s - a.s) > 10
                ORDER BY runs_per_second DESC"

  - dbcheck_7:
      description: "PostgreSQL Database CPU distribution per database, and per query"
      query:  
             "SELECT 
                  pss.userid,
                  pss.dbid,
                  pd.datname as db_name,
                  round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as total_time, 
                  pss.calls, 
                  round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 2) as mean, 
                  round((100 * (pss.total_exec_time + pss.total_plan_time) / sum((pss.total_exec_time + pss.total_plan_time)::numeric) OVER ())::numeric, 2) as cpu_portion_pctg,
                  pss.query
              FROM pg_stat_statements pss, pg_database pd 
              WHERE pd.oid=pss.dbid
              ORDER BY (pss.total_exec_time + pss.total_plan_time)
              DESC LIMIT 30"

  - dbcheck_8:
      description: "List Databases and their sizes"
      query:  
             "SELECT
                  pg_database.datname AS database_name,
                  pg_size_pretty(pg_database_size(pg_database.datname)) AS size
              FROM
                  pg_database
              ORDER BY
                  pg_database_size(pg_database.datname) DESC"
Enter fullscreen mode Exit fullscreen mode

This Python script automates database health checks and generates a well-formatted HTML report.

db_check.py

import psycopg2
import yaml
from jinja2 import Template

# Define your function to load configuration
def load_config(config_file):
    with open(config_file, 'r') as file:
        return yaml.safe_load(file)

# Define your function to connect to the database
def connect_db(hostname, port, dbname, username, password):
    try:
        conn = psycopg2.connect(
            host=hostname,
            port=port,
            dbname=dbname,
            user=username,
            password=password
        )
        return conn
    except Exception as e:
        print(f"Error connecting to database: {e}")
        raise

# Define your function to execute queries and get results
def execute_queries(conn, queries):
    results = []
    try:
        with conn.cursor() as cursor:
            for check in queries:
                description = list(check.keys())[0]
                query = check[description]['query']
                cursor.execute(query)
                result = cursor.fetchall()
                columns = [desc[0] for desc in cursor.description]  # Get column names
                results.append({
                    'description': check[description]['description'],
                    'query': query,
                    'columns': columns,
                    'result': result
                })
    except Exception as e:
        print(f"Error executing queries: {e}")
        raise
    return results

# Define your function to generate HTML report
def generate_html_report(results, output_file):
    template = Template("""
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>Database Check Results</title>
        <style>
            table {
                width: 100%;
                border-collapse: collapse;
            }
            table, th, td {
                border: 1px solid black;
            }
            th, td {
                padding: 8px;
                text-align: left;
            }
            th {
                background-color: #f2f2f2;
            }
            pre {
                white-space: pre-wrap; /* Allows wrapping of long queries */
            }
        </style>
    </head>
    <body>
        <h1>Database Check Results</h1>
        {% for check in results %}
        <h2>{{ check.description }}</h2>
        <pre>Query: {{ check.query }}</pre>
        <h3>Result:</h3>
        <table>
            <thead>
                <tr>
                    {% for column in check.columns %}
                    <th>{{ column }}</th>
                    {% endfor %}
                </tr>
            </thead>
            <tbody>
                {% for row in check.result %}
                <tr>
                    {% for value in row %}
                    <td>{{ value }}</td>
                    {% endfor %}
                </tr>
                {% endfor %}
            </tbody>
        </table>
        {% endfor %}
    </body>
    </html>
    """)

    html_content = template.render(results=results)

    with open(output_file, 'w') as file:
        file.write(html_content)

def main():
    # Database parameters
    db_hostname = 'localhost'
    db_port = 5439
    db_name = 'my_database'
    db_username = 'my_user'
    db_pwd = 'my_password'

    # File paths
    config_file = 'dbcheck_config.yaml'
    output_file = 'result.html'

    # Load configuration
    config = load_config(config_file)
    dbchecks = config['dbchecks']

    # Connect to the database
    conn = connect_db(db_hostname, db_port, db_name, db_username, db_pwd)

    try:
        # Execute queries and get results
        results = execute_queries(conn, dbchecks)

        # Generate HTML report
        generate_html_report(results, output_file)

    finally:
        # Close the database connection
        conn.close()

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Conclusion
By leveraging libraries psycopg2 for database interaction, yaml configuration management, and jinja2 for templating, you can efficiently monitor and document your database performance. This approach can be customized and extended to fit various database environments and reporting requirements.

Top comments (0)