Todd Birchard for Hackers And Slackers

Posted on May 15, 2022 • Originally published at hackersandslackers.com on Jan 9, 2019

Databases in Python Made Easy with SQLAlchemy

#sqlalchemy #sql #python #data

When we think about software interacting with databases directly, many envision a similar pattern of managing connections, explicitly committing queries, and similarly tiresome boilerplate. Python libraries like PyMySQL and Psycopg2 do a fine job fitting this paradigm and upholding the status quo of what we've come to accept. SQLAlchemy has become one of Python's most iconic libraries by abstracting the mundane and trivializing databases in software.

It is also home to the most piss poor documentation for a Python library I've ever had the misfortune of navigating.

SQLAlchemy's API offers an objectively better, simpler, and faster way to work with relational databases. This is useless to those attempting to make sense of the esoteric jargon masquerading as SQLAlchemy's documentation that slaps you across the face and says, "I don't give a shit about you."

I'm here to say I do give a shit about you and would like nothing more than to express these feelings in the form of an SQLAlchemy tutorial.

SQL or ORM?

SQLAlchemy can serve two purposes: making SQL interactions easier, and serving as a full-blown ORM. Even if you aren't interested in implementing an ORM (or don't know what an ORM is), SQLAlchemy is invaluable as a tool to simplify connections to SQL databases and performing raw queries. I thought that was an original statement as I wrote it, but it turns out SQLAlchemy describes itself this way:

SQLAlchemy consists of two distinct components, known as the Core and the ORM. The Core is itself a fully featured SQL abstraction toolkit, providing a smooth layer of abstraction over a wide variety of DBAPI implementations and behaviors, as well as a SQL Expression Language which allows expression of the SQL language via generative Python expressions.

Connecting to a Database

SQLAlchemy gives us a few options for managing database connections, but they all begin with the concept of an engine. An "engine" is a Python object representing a database. Engines are created one time upfront by passing in connection info for a target database.

Once an engine is created, we can swiftly work with our database at any time by interacting with the engine object. Passing a SQL statement to an engine directly lets SQLAlchemy handle opening a new connection, executing, and immediately closing the connection. In the case of sophisticated web apps sustaining a high volume of data transfer, it's preferable to persistent continuous database connections via a session. We'll get to that in the next post.

The syntax for creating an engine is refreshingly simple. create_engine() requires one positional argument, which is a string representing the connection information to connect to a database:

"""Create database connection."""
from sqlalchemy import create_engine

engine = create_engine(
    "mysql+pymysql://user:password@host:3600/database",
)

SQLAlchemy database engine

That string being passed into create_engine() is a connection URI. Assuming you have a database to work with, we'll figure out what your URI should look like together.

Database Connection URIs

Dissecting a URI into its parts looks like this:

[DB_TYPE]+[DB_CONNECTOR]://[USERNAME]:[PASSWORD]@[HOST]:[PORT]/[DB_NAME]

Database connection URI structure

[DB_TYPE]: Specifies the kind (dialect) of database we're connecting to. SQLAlchemy can interface with all mainstream flavors of relational databases. Depending on which database you're connecting to, replace [DB_TYPE] with the matching dialect:
- MySQL : mysql
- PostgreSQL : postgresql
- SQLite : sqlite
- Oracle (ugh): oracle
- Microsoft SQL (slightly less exasperated "ugh"): mssql
[DB_CONNECTOR]: To manage your database connections, SQLAlchemy leverages whichever Python database connection library you chose to use. If you aren't sure what this means, here are the libraries I'd recommend per dialect:
- MySQL : pymysql, mysqldb
- PostgreSQL : psycopg2, pg8000
- SQLite : (none needed)
- Oracle : cx_oracle
- Microsoft SQL : pymssql, pyodbc

The variables that come after should all look familiar; these refer to your target database's URL, a database user, that user's password, etc.

With that out of the way, we can install SQLAlchemy along with your connector:

$ pip install sqlalchemy pymysql

Install SQLAlchemy & connector

Additional Engine Configuration

Besides a connection URI, engines can be configured to accommodate specific needs (or preferences) of your particular database setup. I personally require connections to happen over SSL; this requires a PEM key to be provided for extra security, which we're able to pass via the connect_args keyword argument when creating our engine:

"""Create database connection."""
from sqlalchemy import create_engine
from config import SQLALCHEMY_DATABASE_PEM

engine = create_engine(
    "mysql+pymysql://user:password@host:3600/database",
    connect_args={"ssl": {"key": SQLALCHEMY_DATABASE_PEM}},
    echo=True,
)

SQLAlchemy database engine

If you're the curious type, you can also pass echo=True to your engine, which will print all SQL statements being executed to your console as they happen (including connections, table creation, etc.). This is great for getting insight into what's happening for newcomers but quickly gets annoying and spammy.

Executing Queries

We can run ad hoc queries on an engine object directly by using engine.execute("SELECT * FROM mytable"). When we call execute() on an engine, SQLAlchemy handles:

Opening a connection to our database.
Executing raw SQL on our database and returning the results.
Immediately closes the database connection used to run this query to avoid hanging connections.

This way of interacting with a database is called explicit connectionless execution_. This type of operation is "explicit" in the sense that queries are automatically committed upon execution (contrast this to conventional Python libraries, which abide by PEP-249, which do not run queries unless followed by a commit()).

This is great when we want to run ad hoc queries on demand (ie: nearly all cases that are not user-facing applications). We can fetch or mutate the data we want without worrying about the extra stuff.

SELECT Data

I've set up a database of dummy data for this example. To demonstrate how easy it is to work with SQLAlchemy, I'm going to start by fetching some data from a table called nyc_jobs :

...
from sqlalchemy import text

result = engine.execute(
    text(
        "SELECT job_id, agency, business_title, \
        salary_range_from, salary_range_to \
        FROM nyc_jobs ORDER BY RAND();"
    )
)
print(result)

Execute a SELECT query

I've passed a simple SQL query to engine.execute() which SELECTs a few columns from my nyc_jobs table (returned in random order). Let's check the output of that.

<sqlalchemy.engine.result.Result object at 0x10c59bdc0>

Queries return a Result object

Oh God, what the hell is that?! What's a Result? Was this all too good to be true?!

As of SQLAlchemy 1.4.X, the object ResultProxy has been renamed as Result. Check your version of SQLAlchemy to know which object to except when fetching a result.

Parsing Query Results

Result (is a useful data structure summarizing the result of our query and wrapping the results themselves. The Result object makes it easy to get a high-level look at how our query succeeded by giving us access to attributes, like rowcount:

...
from sqlalchemy import text

result = engine.execute(
    text(
        "SELECT job_id, agency, business_title, \
        salary_range_from, salary_range_to \
        FROM nyc_jobs ORDER BY RAND();"
    )
)
print(f"Selected {result.rowcount} rows.")

Number of rows returned by query

Which gives us...

Selected 3123 rows.

Output of results.rowcount

...But enough horsing around. Let's see what our data looks like by calling fetchall() on our Result:

...
from sqlalchemy import text

result = engine.execute(
    text(
        "SELECT job_id, agency, business_title, \
        salary_range_from, salary_range_to \
        FROM nyc_jobs ORDER BY RAND() LIMIT 10;"
    )
)
print(f"Selected {result.rowcount} rows.")
for row in result.fetchall():
    print(row)

Print rows fetched from query

Now we're talking:

Selected 10 rows.
(399274, 'NYC HOUSING AUTHORITY', 'Vice President Portfolio Planning, Project Development and Asset Mgt', 86346, 217244)
(399276, 'BOROUGH PRESIDENT-QUEENS', 'Director of Graphic Design', 52524, 81535)
(399276, 'BOROUGH PRESIDENT-QUEENS', 'Director of Graphic Design', 52524, 81535)
(399300, 'NYC HOUSING AUTHORITY', 'VICE PRESIDENT TRANSACTIONS', 86346, 217244)
(399300, 'NYC HOUSING AUTHORITY', 'VICE PRESIDENT TRANSACTIONS', 86346, 217244)
(399349, 'CONSUMER AFFAIRS', 'Deputy Director of Special Applications and Operations', 67324, 90686)
(399349, 'CONSUMER AFFAIRS', 'Deputy Director of Special Applications and Operations', 67324, 90686)
(399355, 'DEPT OF INFO TECH & TELECOMM', 'Assistant General Counsel', 66326, 99394)
(399357, 'DEPARTMENT OF BUILDINGS', 'Deputy Commissioner, Development and Technical Affairs', 86346, 212960)
...

Query results

Hey, those look like rows containing job listings! These "rows" look to be a list of tuples at first glance, but SQLAlchemy has one more trick up its sleeve:

...

for row in results.fetchall():
    print(row['business_title'])

Get column value per row

Whoa, did I just try to fetch a key from a tuple with row['business_title']? Have I lost my mind? As it turns out, our "rows" aren't tuples at all; they're data structures called RowProxy:

Selected 10 rows.
COMPUTER SPECIALIST (SOFTWARE)
Resident Engineer
Resident Engineer
Supervising Demolition Inspector
Supervising Demolition Inspector
Senior Inspector
Senior Inspector
Director for Pre-Development Planning
Director for Pre-Development Planning
...

Output of row['business_title']

business_title is one of our table's column names. By looping over row['business_title'], we're seeing every job title in our database; just like looping through a list of dictionaries. In fact, let's do just that: let's return the result of our query as a list of dictionaries:

...

rows = [dict(row) for row in results.fetchall()]
print(rows)

Parse rows as list of dictionaries

Now you have a format that will surely suit your needs regardless of what you're trying to achieve:

[
...
  {
    "agency": "DEPT OF INFO TECH & TELECOMM",
    "business_title": "Information Security Analyst (Data Protection)",
    "job_id": 390003,
    "salary_range_from": 75000,
    "salary_range_to": 100000
  },
  {
    "agency": "HRA/DEPT OF SOCIAL SERVICES",
    "business_title": "JAVA DEVELOPER",
    "job_id": 384199,
    "salary_range_from": 75000,
    "salary_range_to": 180000
  },
  {
    "agency": "DEPT OF HEALTH/MENTAL HYGIENE",
    "business_title": "Early Childhood Education Consultant, Bureau of Childcare",
    "job_id": 386535,
    "salary_range_from": 64507,
    "salary_range_to": 80118
  },
  {
    "agency": "DEPT OF ENVIRONMENT PROTECTION",
    "business_title": "Air Pollution Inspector",
    "job_id": 401509,
    "salary_range_from": 43014,
    "salary_range_to": 43014
  },
  {
    "agency": "DEPT OF DESIGN & CONSTRUCTION",
    "business_title": "Safety Code Compliance Auditor",
    "job_id": 385796,
    "salary_range_from": 54409,
    "salary_range_to": 73124
  },
  {
    "agency": "DEPT OF ENVIRONMENT PROTECTION",
    "business_title": "Procurement Specialist",
    "job_id": 378936,
    "salary_range_from": 48399,
    "salary_range_to": 78211
  },
  {
    "agency": "DEPT OF DESIGN & CONSTRUCTION",
    "business_title": "College Aide",
    "job_id": 401628,
    "salary_range_from": 16,
    "salary_range_to": 20
  },
  {
    "agency": "DEPT OF ENVIRONMENT PROTECTION",
    "business_title": "Engineering Technician I",
    "job_id": 348589,
    "salary_range_from": 36239,
    "salary_range_to": 41675
  },
  {
    "agency": "DEPT OF DESIGN & CONSTRUCTION",
    "business_title": "Executive Assistant",
    "job_id": 395553,
    "salary_range_from": 38851,
    "salary_range_to": 60990
  }
...
]

UPDATE Rows

We can run UPDATE queries the same way we ran our SELECT query, with the caveat that there is such thing as a bad UPDATE query. Let's make sure we avoid those.

I'm picking a random row from my table to update. I'm going to change row 229837 , which looks like this in JSON form:

[
 {
   "job_id": 229837,
   "agency": "DEPT OF INFO TECH & TELECOMM",
   "business_title": "Senior Quality Oversight Analyst",
   "job_category": "Information Technology & Telecommunications",
   "salary_range_from": 58675,
   "salary_range_to": 125000,
   "salary_frequency": "Annual",
   "work_location": "75 Park Place New York Ny",
   "division": "General Counsel",
   "created_at": "2016-02-17T00:00:00.000",
   "updated_at": "2016-02-17T00:00:00.000"
 }
]

SELECT * FROM nyc_jobs WHERE job_id = 229837;

Let's write a potentially dangerous SQL query that introduces some problematic characters:

UPDATE
    nyc_jobs
SET
    business_title = 'Senior QA Scapegoat 🏆',
    job_category = 'Info <>!#%%Technology%%#^&%* & Telecom' 
WHERE
    job_id = 229837;

UPDATE query containing quotes and emojis

It's a bad idea to attempt an UPDATE where the value contains operators that are sure to break the operation. Let's see what happens:

result = engine.execute(
    text(
        "UPDATE nyc_jobs SET business_title = 'Senior QA Scapegoat 🏆', \
        job_category = 'Information? <>!#%%Technology!%%#^&%* & Telecom' \
        WHERE job_id = 229837;"
    )
)
print(result.rowcount)

Executing a problematic UPDATE query

Running this pisses off everything:

2021-01-09 15:44:14,122 INFO sqlalchemy.engine.base.Engine UPDATE nyc_jobs SET business_title = 'Senior QA Scapegoat 🏆', job_category = 'Information <>!#%%Technology%%#^&%* & Telecommunications' WHERE job_id = 229837;
2021-01-09 15:44:14,123 INFO sqlalchemy.engine.base.Engine {}
2021-01-09 15:44:14,123 INFO sqlalchemy.engine.base.Engine ROLLBACK
Traceback (most recent call last):
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/main.py", line 4, in <module>
    init_script()
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/sqlalchemy_tutorial/ __init__.py", line 17, in init_script
    rows_updated = update_job_listing(db)
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/sqlalchemy_tutorial/queries.py", line 18, in update_job_listing
    result = db.execute(
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2235, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1003, in execute
    return self._execute_text(object_, multiparams, params)
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1172, in _execute_text
    ret = self._execute_context(
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1514, in _handle_dbapi_exception
    util.raise_(exc_info[1], with_traceback=exc_info[2])
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 609, in do_execute
    cursor.execute(statement, parameters)
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/pymysql/cursors.py", line 146, in execute
    query = self.mogrify(query, args)
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/pymysql/cursors.py", line 125, in mogrify
    query = query % self._escape_args(args, conn)
TypeError: * wants int

Stack trace for bad SQL UPDATE query

Luckily for us, SQLAlchemy has a method called text() which escapes dangerous characters found in queries like these. Always wrap your queries in this method:

...
from sqlalchemy import text

result = db.execute(
    text(
        "UPDATE nyc_jobs SET business_title = 'Senior QA Scapegoat 🏆', \
        job_category = 'Information <>!#%%Technology%%#^&%* & Telecom' \
        WHERE job_id = 229837;"
    )
)
print(result.rowcount)

Wrap query in text()

We now receive no errors, and a count of 1 is returned to indicate that a single row was updated.

And So Much More

Unless you're a masochist, the SQLAlchemy workflow of running queries is objectively better than the boilerplate you'd be dealing with while using a vanilla DB connector. I can't begin to quantify the amount of time I've personally saved from this simplified workflow, which makes the following assertion seem outlandish: there is still so much more to gain from SQLAlchemy.

Anyway, one step at a time. I put together a working demo containing all the source code of what we covered in this tutorial on Github below. Join us next time!

hackersandslackers / sqlalchemy-tutorial

🧪🔬 Use SQLAlchemy to connect, query, and interact with relational databases.

SQLAlchemy Tutorial

This repository contains the source code for a four-part tutorial series on SQLAlchemy:

Getting Started

Get set up locally in two steps:

Environment Variables

Replace the values in .env.example with your values and rename this file to .env:

SQLALCHEMY_DATABASE_URI: Connection URI of a SQL database.
SQLALCHEMY_DATABASE_PEM (Optional): PEM key for databases requiring an SSL connection.

Remember never to commit secrets saved in .env files to Github.

Installation

Get up and running with make deploy:

$ git clone https://github.com/hackersandslackers/sqlalchemy-tutorial.git
$ cd sqlalchemy-tutorial
$ make deploy

Hackers and Slackers tutorials are free of charge. If you found this tutorial helpful, a small donation would be greatly appreciated to keep us in business. All proceeds go towards coffee, and all coffee goes towards more content.

View on GitHub

DEV Community