As a company that relies on data-driven intelligence and insights, 4Degrees naturally has a complex database. In a previous post we provided an introductory SQLAlchemy guide to help new interns and employees understand the basics of the SQLAlchemy ORM.
In this post we are going to build on that understanding and dive deeper into how multiple connected tables are mapped to python classes and how one might go about querying data that spans multiple tables.
Table Relationships
The most common table relationship is known as a 1:n relationship. As an example, a user
may post to a blog and have multiple blog posts which are stored in the user_blog_post
table. In order to accomplish this we define two python classes and add a column called user_id
to the blog post table which serves as a link between the two tables.
class user(db.Model):
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String)
email = db.Column(db.String)
class user_blog_post(db.Model):
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String)
body = db.Column(db.String)
user_id = db.Column(db.Integer)
An astute reader may note that we could technically stop at this point. We have two tables and a column that links the two. While this is technically true, it leaves room for error. What if we accidentally created a user_blog_post
table for a user that doesn’t exist?
To solve this problem, we introduce the main tool used in defining table relationships: the ForeignKey. The ForeignKey is an argument accepted by the Column object, which indicates that the value of this column must be constrained to an existing value in another table. In essence, if we intend for our two tables to be related, we ought to be sure that our user_id
reference doesn’t refer to a record that doesn’t exist.
user_id = db.Column(db.Integer, ForeignKey('user.id'))
If we now tried to create a row in the user_blog_post
table using a user_id
that does not exist as an id value in the user
table (eg. 55), we get an error and the row is not created:
psycopg2.errors.ForeignKeyViolation: insert or update on table "user_blog_post" violates foreign key constraint "user_blog_post_user_id_fkey"
DETAIL: Key (user_id)=(55) is not present in table "user".
The existence of a ForeignKey in a table implies a many-to-one relationship between itself and its parent table. Put another way, if we choose to place the constrained user_id
value in the user_blog_post
class such that there must exist a matching id column in the user
class, there can be multiple blog posts for each user, but only one user for each blog post.
This brings us to a second useful tool in SQLAlchemy for table relationships: the db.relationship() method. It is important to understand that this method is not truly necessary for defining table relationships. In fact, as we mentioned above, the only truly necessary item for a table relationship is the column that links two tables (ie. user_id
).
Rather, the db.relationship() method leverages the implicit many-to-one relationship created by a ForeignKey column and exposes this linkage between the two tables to the mapped python classes:
blog_posts = db.relationship("user_blog_post", backref='user')
By placing this line within the user
class, SQLAlchemy ORM establishes user.blog_posts
and user_blog_posts.user
attributes that can be accessed in our python environment. What’s more, our python environment understands the many-to-one relationship:
newBlogPost = user_blog_post(title='first', body='first post body', user_id=1)
print(newBlogPost.user)
→ None
newUser = user(username='Moshe', email='moshe@4degrees.ai')
print(newUser.blog_posts)
→ [ ]
Note, that the user
table does not actually have a blog_post column and the user_blog_post
table does not actually have a user column. This is clear in the SQLAlchemy code below and the SQL that it outputs:
newUser = user(username='Moshe', email='moshe@4degrees.ai')
newUser.blog_posts = [
user_blog_post(title='first', body='first post body'),
user_blog_post(title='second', body='second post body')
]
db.session.add(newUser)
db.session.commit()
SQL
INSERT INTO “user” (username, email) VALUES (?, ?)
('Moshe', 'moshe@4degrees.ai')
INSERT INTO “user_blog_post” (title, body, user_id) VALUES (?, ?, ?)
('first', 'first post body', 1)
INSERT INTO “user_blog_post (title, body, user_id) VALUES (?, ?, ?)
('second', 'second post body', 1)
Querying Table Relationships
Now that we have an understanding of how to define new table relationships, let’s quickly dive into an exercise that will demonstrate how to query across multiple related tables. Suppose we want to email all users who have written a blog post which contain the words “machine learning” in the body of their blog posts. We will then send them all an invitation to an upcoming machine learning conference. How do we get this list?
The data we need is email
in the user
table, but the data we need to filter by is body
in the user_blog_post
table. In order to accomplish this we can use the Query.join() method:
theEmails = user.query.with_entities(user.email).join(user_blog_post).\
filter(user_blog_post.body.like('%machine learning%')).all()
Interestingly, SQLAlchemy ORM implicitly knows how to join the user
and user_blog_post
tables, because there is only one ForeignKey between them. In a more complicated situation, that would not be the case and we would explicitly describe the ForeignKey relationship:
theEmails = user.query.with_entities(user.email).join(user_blog_post, user.id == user_blog_post.user_id).filter(user_blog_post.body.like('%machine learning%')).all()
Top comments (0)