DEV Community

Cover image for Leveraging Google Colab to run Postgres: A Comprehensive Guide
PGzlan
PGzlan

Posted on

Leveraging Google Colab to run Postgres: A Comprehensive Guide

Google Colab is a powerful cloud-based platform that provides a convenient environment for running Python code, performing data analysis, and executing machine learning tasks. While Colab offers built-in support for popular data manipulation libraries, such as Pandas and NumPy, connecting to a PostgreSQL database requires additional setup. In this guide, we'll explore how you can use Google Colab with Postgres, enabling you to leverage the power of SQL database operations within your Colab notebooks.

Prerequisites

Before we dive into the process, ensure you have the following prerequisites in place:

  1. A Google account: You'll need a Google account to access Google Colab.
  2. A PostgreSQL database: Ensure you have a working PostgreSQL database instance with the necessary credentials.

Setting up Google Colab

  1. Open your web browser and navigate to Google Colab.

  2. Sign in with your Google account if you haven't already.

  3. Click on "New Notebook" to create a new Colab notebook.

Installing Required Libraries

Before doing anything, we need to check that the Postgres version we want is installed on Colab and then we can proceed, for the purposes of this walkthrough, we'll use Postgres 15



!sudo apt update
!sudo apt install dirmngr ca-certificates software-properties-common gnupg gnupg2 apt-transport-https curl -y
!curl -fSsL https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor | sudo tee /usr/share/keyrings/postgresql.gpg > /dev/null
!echo 'deb [arch=amd64,arm64,ppc64el signed-by=/usr/share/keyrings/postgresql.gpg] http://apt.postgresql.org/pub/repos/apt/ jammy-pgdg main' | sudo tee /etc/apt/sources.list.d/pgdg.list
!sudo apt update
!sudo apt install postgresql-client-15 postgresql-15 -y
!sudo service postgresql start


Enter fullscreen mode Exit fullscreen mode

The following steps should end with the a screen similar to this

Successful installation

Once Postgres is in place, we should configure our users with proper permissions so that we can easily access our database. Let's create the following users:



# Setup a password `postgres` for username `postgres`
!sudo -u postgres psql -U postgres -c "CREATE ROLE root WITH SUPERUSER;"
!sudo -u postgres psql -U postgres -c "ALTER ROLE root WITH LOGIN;"
!sudo -u postgres psql -U postgres -c "CREATE ROLE postgres WITH PASSWORD 'postgres';"


Enter fullscreen mode Exit fullscreen mode

Let's break down each line of the snippet:

  1. !sudo -u postgres psql -U postgres -c "CREATE ROLE root WITH SUPERUSER;": This line executes a PostgreSQL command to create a new role (user) named "root" with superuser privileges. Superuser privileges provide extensive permissions and control over the database system. By creating this role, the snippet aims to establish a powerful user for administrative purposes.
  2. !sudo -u postgres psql -U postgres -c "ALTER ROLE root WITH LOGIN;": This line executes a PostgreSQL command to allow the "root" role to log in. The WITH LOGIN clause grants the role the ability to authenticate and access the database system.
  3. !sudo -u postgres psql -U postgres -c "CREATE ROLE postgres WITH PASSWORD 'postgres';": This line creates a new role named "postgres" with the password "postgres". The role is created with default privileges, which are typically limited compared to a superuser role. This setup allows for a more "secure" configuration by setting a password for the "postgres" role.

In my experience, setting root with LOGIN from the get-go usually fails for some reason. It also happens with the postgres user. If you encounter that, just run:



!sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres'"


Enter fullscreen mode Exit fullscreen mode

Now that's out of the way, let's see how we can use Postgres.

One of the issues that might seem annoying with Colab, is that if you run a cell for some jobs, it'll block the entire notebook from executing any cells. Luckily, we aren't trapped in all cases. There's an IPython extention called colabxterm that can be used to provide a terminal-like interface from a cell within the notebook. We can use this to out advantage. To install it, we just do the following



!pip install colab-xterm
load_ext colabxterm


Enter fullscreen mode Exit fullscreen mode

After this is done, you can run



%xterm


Enter fullscreen mode Exit fullscreen mode

To spawn a console

Console spawned

We can use this to connect to psql and run our queries!

psql run

To connect to a PostgreSQL database from Colab, we need to install the psycopg2 library, which allows Python to interact with PostgreSQL databases. Additionally, we'll install the sqlalchemy library, which provides a higher-level interface for working with databases.

To install the required libraries, add the following code snippet in a Colab cell:



!pip install psycopg2-binary sqlalchemy


Enter fullscreen mode Exit fullscreen mode

Importing Necessary Modules

In the first cell of your Colab notebook, import the required modules:



import psycopg2
from sqlalchemy import create_engine


Enter fullscreen mode Exit fullscreen mode

Connecting to the PostgreSQL Database

To establish a connection to your PostgreSQL database, you need to provide the necessary connection details, such as the host, port, database name, username, and password.

Use the following code snippet as an example to establish a connection using psycopg2:



try:
    connection = psycopg2.connect(
        host="your_host",
        port="your_port",
        database="your_database",
        user="your_username",
        password="your_password"
    )
    cursor = connection.cursor()
    print("Connection established successfully!")
except Exception as e:
    print("Error connecting to the database:", e)


Enter fullscreen mode Exit fullscreen mode

Replace the placeholders (your_host, your_port, your_database, your_username, your_password) with your actual database connection details.

Alternatively, you can use the sqlalchemy library to create an engine and connect to the database:



try:
    engine = create_engine('postgresql://your_username:your_password@your_host:your_port/your_database')
    connection = engine.connect()
    print("Connection established successfully!")
except Exception as e:
    print("Error connecting to the database:", e)


Enter fullscreen mode Exit fullscreen mode

Replace the placeholders (your_host, your_port, your_database, your_username, your_password) with your actual database connection details.

Executing SQL Queries

Once the connection is established, you can execute SQL queries and fetch the results using the cursor object (if using psycopg2) or the execute method of the engine (if using sqlalchemy).

Here's an example that executes a simple SELECT query and fetches the results using psycopg2:



try:
    cursor.execute("SELECT * FROM your_table;")
    results = cursor.fetchall()

    for row in results:
        print(row)
except Exception as e:
    print("Error executing the query:", e)


Enter fullscreen mode Exit fullscreen mode

Replace your_table with the name of the table you want to query from your database.

For sqlalchemy, you can execute queries using the execute method of the engine object:



try:
    result = connection.execute("SELECT * FROM your_table")
    for row in result:
        print(row)
except Exception as e:
    print("Error executing the query:", e)


Enter fullscreen mode Exit fullscreen mode

Replace your_table with the name of the table you want to query from your database.

Closing the Connection

After you finish executing the queries, remember to close the connection to the database to release any resources.

Use the following code to close the connection when using psycopg2:



if connection:
    cursor.close()
    connection.close()
    print("Connection closed successfully!")


Enter fullscreen mode Exit fullscreen mode

For sqlalchemy, close the connection as follows:



if connection:
    connection.close()
    print("Connection closed successfully!")


Enter fullscreen mode Exit fullscreen mode

Here's what this would look like:

SQLAlchemyDemo

Conclusion

In this guide, we explored how to use Google Colab with a PostgreSQL database. By following the steps outlined here, you can establish a connection to your database, execute SQL queries, and retrieve the results within your Colab notebooks. This integration allows you to leverage the power of both Colab's data analysis capabilities and the SQL operations provided by Postgres, enabling you toperform advanced data manipulations and gain valuable insights.

One of the key advantages of using Google Colab with Postgres is the ability to seamlessly combine the data analysis capabilities of Colab with the power of SQL queries. With Colab, you have access to a wide range of Python libraries for data manipulation and visualization. By connecting to a Postgres database, you can leverage the SQL language to efficiently retrieve and manipulate data directly from your database.

For example, you can use SQL queries to perform aggregations, filtering, and join operations on large datasets stored in your Postgres database. These operations can be executed efficiently on the database server, reducing the data transfer overhead and improving performance. Once you have the desired data subset, you can use Colab's data analysis libraries, such as Pandas and Matplotlib, to further explore and visualize the data.

Integrating Google Colab with Postgres opens up a world of possibilities for data analysis and exploration. By combining the strengths of Colab's Python libraries and Postgres' SQL capabilities, you can efficiently work with large datasets, perform advanced data manipulations, and gain valuable insights. So, give it a try and unlock the full potential of your data analysis workflows with Google Colab and Postgres!

GL, HF!

Top comments (0)