DEV Community

Cover image for Advanced SQLAlchemy Features You Need To Start Using
Martin Heinz
Martin Heinz

Posted on • Originally published at martinheinz.dev

Advanced SQLAlchemy Features You Need To Start Using

If you are Python developer and you work with SQL databases, then SQLAlchemy is most likely a library you are familiar with. It's powerful, yet flexible toolkit for working with SQL in Python with lots of features. Some of these features like ORM and basic queries are common knowledge, but there are quite a few features you might not know about and should definitely be taking advantage of. So, let's se how to leverage things like hybrid properties, nested queries, table metadata, dialects and more!

Column Properties

Let's start simple. I think it's pretty common that you might want to create mapped attribute based on other columns - essentially creating computed column. The simplest example would be string concatenation:

class User(Base):
    __tablename__ = 'user'
    id = Column(Integer, primary_key=True)
    firstname = Column(String(50))
    lastname = Column(String(50))
    fullname = column_property(firstname + " " + lastname)

This is nice, but it's much more useful when we use SQL expressions to create such attribute:

class CreditCard(Base):
    __tablename__ = 'card'
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('user.id'), nullable=True)


class User(Base):
    __tablename__ = 'user'
    id = Column(Integer, primary_key=True)
    firstname = Column(String(50))
    lastname = Column(String(50))
    fullname = column_property(firstname + " " + lastname)
    credit_card = relationship(CreditCard, backref='report')
    has_credit_card = column_property(
        exists().where(CreditCard.user_id == id)
    )

john = User(id=1, firstname='John', lastname='Doe')
session.add(john)
session.commit()
print(john.has_credit_card)
# False
johns_card = CreditCard(user_id=1)
session.add(johns_card)
session.commit()
print(john.has_credit_card)
# True

For the example above we added a little bit more code. We created CreditCard class which has many-to-one relationship with User. This user - on top of the columns and attributes from first example - has also column property named has_credit_card, which is computed by checking whether credit card with users ID exists.

One thing you should be mindful of though when using this feature is that column properties won't be populated before you commit the session, which might be unexpected when working with freshly created record:

john = User(firstname='John', lastname='Doe')
print(john.fullname)
# None
session.add(john)
session.commit()

print(john.fullname)
# John Doe

Hybrid Properties

To follow up on the previous tip, let me also show you hybrid properties. They are similar to column properties in the sense that they produce computed attributes. Hybrid properties however, produce value from Python expression on instance level and SQL expression on class level. Little confusing? Alright, let's see an example:

class Order(Base):
    __tablename__ = 'order'

    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('user.id'))
    state = Column(String(20))  # Pending/Complete


class User(Base):
    __tablename__ = 'user'

    id = Column(Integer, primary_key=True)
    orders = relationship("Order")
    name = Column(String(50))

    @hybrid_property
    def has_pending_orders(self):
        return any(order.state == "Pending" for order in self.orders)  # -> produces value

    @has_pending_orders.expression
    def has_pending_orders(cls):
        return (
            select([
                case([(exists().where(and_(
                    Order.user_id == cls.id,
                    Order.state == "Pending", )).correlate(cls), True)], else_=False,
                ).label("has_pending_order")
            ]).label("number_of_pending_orders")
        )  # -> produces SQL expression

user = User(
    name="John",
    orders=[
        Order(state="Complete"),
        Order(state="Pending"),
    ]
)

print(user.has_pending_orders)  # evaluate as Python expression
# True
user = session.query(User).filter(User.has_pending_orders).scalar()  # evaluate as SQL expression (Filter)
# SELECT * FROM user 
# WHERE (
#   SELECT CASE WHEN (EXISTS (
#       SELECT * 
#       FROM order 
#       WHERE order.user_id = user.id AND order.state = 'Pending'
#   )) THEN 1 ELSE 0 END AS has_pending_order)

To show off the capabilities of hybrid_property, we implement simple relationship between User and Order, where each user has list of orders which have .state - in this case either Pending or Complete. Now, if we want to find out whether user has any Pending orders, we need to think of 2 cases - If we are working with rows that were already loaded into Python objects, then we can just use Python expression and produce Python value (has_pending_orders(self)). If we are on the other hand querying this information directly from database, we can't use Python expression as database engine won't understand it. Therefore, for this case (has_pending_orders(cls)) we write SQL expression, that can be ran against the database.

As a side note - if your expression is same for both Python and SQL evaluation, then you can omit the second function decorated with .expression and SQLAlchemy will use the first one for both cases.

Mixins

One of my favourite features are Mixin classes. Mixins aren't something specific only to SQLAlchemy, but they are especially useful in conjunction with ORM models. Quite often you might run into situation, where you have multiple classes (models) that require same attribute or same classmethod. One such example is User model below:

class MixinAsDict:
    def as_dict(self):
        return {c.name: getattr(self, c.name) for c in self.__table__.columns}

class MixinGetByUsername:
    username = Column(String(200), unique=True, nullable=True)

    @classmethod
    def get_by_username(cls, username):
        return session.query(cls).filter(cls.username == username).first()


class User(MixinAsDict, MixinGetByUsername, Base):
    __tablename__ = 'user'
    id = Column(Integer, primary_key=True)

user = User(id=1, username="John")
print(user.as_dict())
# {'username': 'John', 'id': 1}
session.add(user)
session.commit()

john = User.get_by_username("John")
print(f"User: {john.username} with ID: {john.id}")
# User: John with ID: 1

In this example we have 2 Mixin classes from which the User model inherits. First of them - MixinAsDict provides method as_dict(self), that can be used to get dict representation of the model. The other one MixinGetByUsername provides both username column as well as static method for querying users by their username.

Defining these functions as Mixins allows us to make them reusable and add them to other models without copy-pasting same code everywhere.

If you don't want to write all the Mixins yourself, then you can take a look at https://github.com/absent1706/sqlalchemy-mixins which is a collection of common SQLAlchemy Mixins.

Working with Metadata

Sometimes you might need to access table column names, check constraints on the table or maybe check if columns is nullable. All of this can be done with MetaData() class:

class Address(Base):
    __tablename__ = 'address'
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('user.id'), nullable=True)
    street = Column(String(50))

class User(Base):
    __tablename__ = 'user'
    id = Column(Integer, primary_key=True)
    firstname = Column(String(50))
    lastname = Column(String(50))
    address = relationship(Address, backref='report')

Base.metadata.create_all(engine)

meta = Base.metadata  # Metadata()

for t in meta.sorted_tables:
    print(t.name)

# user
# address

print(meta.tables["user"].c)
# ['user.id', 'user.firstname', 'user.lastname']
print(meta.tables["user"].c["lastname"].type)
# VARCHAR(50)
print(meta.tables["user"].c["lastname"].nullable)
# True
print(meta.tables["address"].foreign_keys)
# {ForeignKey('user.id')}
print(meta.tables["address"].primary_key)
# PrimaryKeyConstraint(Column('id', Integer(), table=<address>, primary_key=True, nullable=False))

The important part here are the print statements at bottom of code snippet. Each of them demonstrates some of the things you can access through the metadata object. This includes table names, columns names, column type, foreign and primary keys as well as other constraints.

Configuring Tables

Some of your database tables might require a bit more extensive initial setup. For example - you might want to include a few check constraints, indexes or specify different schema:

class Card(Base):
    __tablename__ = 'card'
    __table_args__ = (
        CheckConstraint("created < valid_until", name="validity_check"),
        CheckConstraint("card_type ~* '^(debit|credit){1}$''", name="type_check"),
        Index("index", "id"),
        ForeignKeyConstraint(['id'], ['remote_table.id']),
        {'extend_existing': True, "schema": "default"},
    )

    id = Column(Integer, primary_key=True)
    created = Column(Date)
    valid_until = Column(Date)
    card_type = Column(String(50))

All of these things can be configured using __table_args__ class attribute. Here, we setup 2 check constraints, 1 index for ID column and foreign key constraint. We also turn on automatic table extensions, which means that if we add columns to this table after it was created, then it will be automatically added. Lastly, we also specify to which schema this table belongs to.

Using Custom Dialects

Every database engine has some custom features, which you might want to make use of. For me - as a PostgreSQL user - I would like to use some of the custom column types that PostgreSQL has. So how would one use those with SQLAlchemy?

from uuid import uuid4
from sqlalchemy.dialects.postgresql import UUID, INT4RANGE, NUMRANGE, JSON

engine = create_engine('postgresql+psycopg2://postgres:postgres@localhost/testdb', echo=True)

class Example(Base):
    __tablename__ = 'example'

    id = Column(Integer, primary_key=True)
    uuid = Column(UUID(as_uuid=True), unique=True, nullable=False, default=uuid4)
    int_range = Column(INT4RANGE)
    num_range = Column(NUMRANGE)
    pg_json = Column(JSON)
    pg_array = Column(postgresql.ARRAY(Integer), server_default='{}')


from psycopg2.extras import NumericRange

example = Example(
    uuid=uuid4(),
    int_range=NumericRange(1, 3),
    num_range=NumericRange(1, 3),
    pg_json={"key": "value"},
    pg_array=[1, 5, 7, 24, 74, 8],
)

print(session.query(Example).filter(Example.pg_array.contains([5])).scalar())
# SELECT * FROM example WHERE example.pg_array @> [5]

# <__main__.Example object at 0x7f2d600a4070>  # Object we previously inserted

print(session.query(Example).filter(Example.pg_json["key"].astext == "value").scalar())
# SELECT *
# FROM example 
# WHERE (example.pg_json ->> 'key' = 'value'

# <__main__.Example object at 0x7f04dee05070>  # Object we previously inserted

The code above shows one Example table that has PostgreSQL UUID, INT4RANGE, NUMRANGE, JSON and ARRAY columns. All of these and more can be imported from sqlalchemy.dialects.postgresql.

Creating rows that include values of these types is pretty self-explanatory. When it comes to querying them though, you will need to use the dialect and type specific comparators as shown above with PostgreSQL ARRAY type and .contains comparator.

For other types like JSON you might be able to get away with just comparing them as text (using .astext).

To make your life easier when creating these queries, I recommend setting echo=True when creating engine, which will make SQLAchemy print all SQL queries into console, so that you can check whether your code actually produces correct queries.

All of the dialects, their types and comparators are documented at https://docs.sqlalchemy.org/en/13/dialects/.

Full-text Search with PostgreSQL

While on the topic of PostgreSQL features. What about the full-text search with tsqeury and tsvector? We can do that with SQLAchemy too:

class MixinSearch:

    @classmethod
    def fulltext_search(cls, session, search_string, field):
        return session.query(cls). \
            filter(func.to_tsvector('english', getattr(cls, field)).match(search_string, postgresql_regconfig='english')).all()

class Book(MixinSearch, Base):
    __tablename__ = 'book'

    id = Column(Integer, primary_key=True)
    title = Column(String(100))
    body = Column(Text)

book = Book(
    title="The Catcher in the Rye",
    body="""First page of the book..."""
)

success = Book.fulltext_search(session, "David & Copperfield", "body")
# SELECT *
# FROM book 
# WHERE to_tsvector(english, book.body) @@ to_tsquery('english','David & Copperfield')
print(success)
# [<__main__.Book object at 0x7fdac5e44520>]
fail = Book.fulltext_search(session, "cat & dog", "body")
# SELECT *
# FROM book 
# WHERE to_tsvector(english, book.body) @@ to_tsquery('english', 'cat & dog')
print(fail)
# []

Once again we create Mixin class for full-text search, as this is something that a lot of models can use. This Mixin has single static method, which takes search string and column to search in (field). To do the actual search we use func.to_tsvector to which we pass language and reference to tables column. On that, we chain call to .match function which really is a call to to_tsquery in PostgreSQL and we give it search string and search configuration as arguments.

From the generated SQL we can see that the Python code really produces correct SQL queries.

Tracking Last Update on Rows

Creating created_at or updated_at column is pretty common practice. This can be done very simply with SQLAlchemy:

class Example(Base):
    __tablename__ = 'example'

    id = Column(Integer, primary_key=True)
    updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now())
    data = Column(String(100))


example = Example(
    data="Some data..."
)

row = session.query(Example).scalar()
print(row.updated_at)
# 10:13:14.001813+00:00

time.sleep(...)
row.data = "Some new data..."
session.add(row)
session.commit()

row = session.query(Example).scalar()
print(row.updated_at)
# 10:13:16.590945+00:00

For updated_at you just need to set onupdate to func.now() which will make it so that every time the row is updated, this column will be set to current timestamp. As for the created_at column, you can omit the onupdate argument and instead use server_default which sets the function that is called when row is created.

Self-Referencing Tables

It's not uncommon to have recursive/self-referential relations in database - whether it's manager -> employee relationship, tree structures or some materialized path. This tip shows how you can setup this kind of relationship using SQLAlchemy:

class Node(Base):
    __tablename__ = 'node'
    id = Column(Integer, primary_key=True)
    parent_id = Column(Integer, ForeignKey('node.id'))
    data = Column(String(50))
    children = relationship("Node",
                            backref=backref('parent', remote_side=[id])
                            )

    def __str__(self, level=0):
        ret = f"{'    ' * level} {repr(self.data)} \n"
        for child in self.children:
            ret += child.__str__(level + 1)
        return ret

    def __repr__(self):
        return self.data


node = Node(
    data="Node 1",
    children=[
        Node(data="Node 2"),
        Node(
            data="Node 3",
            children=[
                Node(data="Node 5")
            ]
        ),
        Node(data="Node 4"),
    ]
)
rows = session.query(Node).all()
print(rows[0])
# 'Node 1' 
#     'Node 2' 
#     'Node 3' 
#         'Node 5' 
#     'Node 4' 
print(rows[2])
# 'Node 3' 
#     'Node 5'

For this example we use tree structure created using Node records. Each node has some data, reference to its parent and list of its children. As a convenience method we also include __str__ and __repr__ to help us visualize the tree little better.

If you are fine with normal one-to-many relationship, then you can do it the same way as for any non-self-referential relationship. To make it work for bi-directional relationships however, you need to also include the backref with remote_side=[id] as shown above.

Binding Multiple Databases with Flask

Last one is for all the Flask users. If you ever need to connect to multiple databases - for example because of multiple geopgraphies or multiple data sources - then you can use SQLALCHEMY_BINDS to specify extra database binds:

# Config
SQLALCHEMY_DATABASE_URI = 'postgres+psycopg2://localhost/emea'  # Europe, the Middle East and Africa
SQLALCHEMY_BINDS = {
    'emea':     'postgres+psycopg2://localhost/emea',  # Europe, the Middle East and Africa
    'ap':       'mysqldb://localhost/ap',              # Asia Pacific
    'la':       'postgres+psycopg2://localhost/la',    # Latin America
}

# Models
class User(db.Model):
    __bind_key__ = 'emea'  # Declare to which database the model belongs to
    id = db.Column(db.Integer, primary_key=True)
    username = db.Column(db.String(80), unique=True)

In the code snippet above we configure default database by setting SQLALCHEMY_DATABASE_URI and alternative binds in SQLALCHEMY_BINDS. With this configuration, all the above databases will be available to us. Next, we set __bind_key__ of a table to refer to one of the binds, so that whenever we interact with this particular table, SQLAlchemy will know which database to connect to.

If you however need to connect to multiple DBs with same tables/schema, you can use multiple engines and sessions - one for each database and switch between them as as you wish, like so:

engine_emea   = create_engine(...)
engine_ap     = create_engine(...)
engine_la     = create_engine(...)

session_emea  = sessionmaker(bind=engine_emea)
session_ap    = sessionmaker(bind=engine_ap)
session_la    = sessionmaker(bind=engine_la)

Conclusion

Hopefully at least few of these tips and tricks shown here will be useful to you and will make your life just a little bit easier next time you need work with SQLAlchemy. This article is definitely not exhaustive list of all the cool things you can do with SQLAlchemy and you can find bunch of useful things just by scrolling through SQLAlchemy API reference.

Top comments (0)