SqlAlchemy: simplify complex joins

#sql #python #sqlalchemy

I am writing this article after coming across the same issue a second time and having to spend a good chunk of time figuring my issue out. I found this to be a super sweet feature that sqlAlchemy allows us to do, and since I can abstract away the sqlAlchemy calls, making the code and my thinking a bit simpler than if I were to do it using simple sql.

The problem

I have two tables, and for my case, I need to join the results of an sql query in one table with the results of an sql query in another table.
While I cant show the exact endpoint, I will give a lay of the land, and setup 2 sample tables and a usecase that I hope will demonstrate the problem well.

The DB Setup

Books Table

id	location_in_library	price	author

Bookmarks Table

id	page_number	book_id	bookmark_date

In the case of these tables, suppose we have a case:

Simple Query

fetch bookmark_date, for existing books. If a book does not have any bookmarks, we would like to show it (using some processing in the endpoint, does not concern us how that happens).

How to go about doing that? well here it is simple, just do a join on the basis of Books.id == Bookmarks.book_id.

Complicated Query

How do we proceed in the case of a more complicated query though? Say, For books having a location like X in the bookshop, we want to fetch their price, author information, as well as the number of bookmarks they have.
What I find to be a really intuitive approach here is to utilize what sqlalchemy calls subqueries. Sql also has subqueries, but instead of being written in the same line of code, these feel like more of an isolation. I really like how simple an operation like this can be made in sqlalchemy, greatly improving readability and the time I spend figuring out the query.

Using SqlAlchemy

Lets split this complex query, rather easily into different parts.

For books having a location like X, return price, author information
Fetch number of bookmarks for the book

In sqlalchemy this will be:

# Book and BookMark are the sqlalchemy classes corresponding
# to each table
books = Book.query.filter(Book.location == X) \
        .with_entities(Book.id.label('book_id'), \
        Book.location.label('location'), \
        Book.price.label('price'), \
        Book.author.label('author')).subquery()
bookmarks = BookMark.query.with_entities( \
            BookMark.book_id.label('book_id') \
            func.count(BookMark.id).label(bookmark_count)) \
            .group_by(BookMark.id).subquery()

Now here we have separated out books, and bookmarks into 2 individual subqueries, and can just think about them in terms of what each contains. All we need to do is join them, and we can do that as follows:

# The reason we do books.c.book_id is to index the column of
# a particular subquery
res = db.session.query(books.c.book_id, \
      books.c.location, books.c.price, \
      books.c.author, bookmarks.c.bookmark_count) \ 
      .join(books, books.c.book_id == bookmarks.c.book_id)

And, that is it. That is all we need to do to get this and even more complicated queries, maybe something requiring more operations from each or a single table.

Conclusion

This was an example I replicated from something I was working on at work, of course, I cant show exactly that, but my problem was similar to that of the example shown. I had tried looking around a bunch of blogs and stack exchange posts failing to find a good way of implementing this in SQL. I later realized that I would have had to port that SQL to sqlAlchemy code. This in my opinion works better just development-wise, because we are able to abstract out the two complicated queries, thinking just in terms of data we need, and not getting too involved in the syntax or ordering.

Anyways, hope this helps.

DEV Community