As a company that relies on data-driven intelligence and insights, 4Degrees naturally has a complex database. In a previous post we provided an introductory SQLAlchemy guide to help new interns and employees understand the basics of the SQLAlchemy ORM.
In this post we are going to build on that understanding and dive deeper into how multiple connected tables are mapped to python classes and how one might go about querying data that spans multiple tables.
The most common table relationship is known as a 1:n relationship. As an example, a
user may post to a blog and have multiple blog posts which are stored in the
user_blog_post table. In order to accomplish this we define two python classes and add a column called
user_id to the blog post table which serves as a link between the two tables.
class user(db.Model): id = db.Column(db.Integer, primary_key=True) username = db.Column(db.String) email = db.Column(db.String) class user_blog_post(db.Model): id = db.Column(db.Integer, primary_key=True) title = db.Column(db.String) body = db.Column(db.String) user_id = db.Column(db.Integer)
An astute reader may note that we could technically stop at this point. We have two tables and a column that links the two. While this is technically true, it leaves room for error. What if we accidentally created a
user_blog_post table for a user that doesn’t exist?
To solve this problem, we introduce the main tool used in defining table relationships: the ForeignKey. The ForeignKey is an argument accepted by the Column object, which indicates that the value of this column must be constrained to an existing value in another table. In essence, if we intend for our two tables to be related, we ought to be sure that our
user_id reference doesn’t refer to a record that doesn’t exist.
user_id = db.Column(db.Integer, ForeignKey('user.id'))
If we now tried to create a row in the
user_blog_post table using a
user_id that does not exist as an id value in the
user table (eg. 55), we get an error and the row is not created:
psycopg2.errors.ForeignKeyViolation: insert or update on table "user_blog_post" violates foreign key constraint "user_blog_post_user_id_fkey" DETAIL: Key (user_id)=(55) is not present in table "user".
The existence of a ForeignKey in a table implies a many-to-one relationship between itself and its parent table. Put another way, if we choose to place the constrained
user_id value in the
user_blog_post class such that there must exist a matching id column in the
user class, there can be multiple blog posts for each user, but only one user for each blog post.
This brings us to a second useful tool in SQLAlchemy for table relationships: the db.relationship() method. It is important to understand that this method is not truly necessary for defining table relationships. In fact, as we mentioned above, the only truly necessary item for a table relationship is the column that links two tables (ie.
Rather, the db.relationship() method leverages the implicit many-to-one relationship created by a ForeignKey column and exposes this linkage between the two tables to the mapped python classes:
blog_posts = db.relationship("user_blog_post", backref='user')
By placing this line within the
user class, SQLAlchemy ORM establishes
user_blog_posts.user attributes that can be accessed in our python environment. What’s more, our python environment understands the many-to-one relationship:
newBlogPost = user_blog_post(title='first', body='first post body', user_id=1) print(newBlogPost.user) → None newUser = user(username='Moshe', firstname.lastname@example.org') print(newUser.blog_posts) → [ ]
Note, that the
user table does not actually have a blog_post column and the
user_blog_post table does not actually have a user column. This is clear in the SQLAlchemy code below and the SQL that it outputs:
newUser = user(username='Moshe', email@example.com') newUser.blog_posts = [ user_blog_post(title='first', body='first post body'), user_blog_post(title='second', body='second post body') ] db.session.add(newUser) db.session.commit() SQL INSERT INTO “user” (username, email) VALUES (?, ?) ('Moshe', 'firstname.lastname@example.org') INSERT INTO “user_blog_post” (title, body, user_id) VALUES (?, ?, ?) ('first', 'first post body', 1) INSERT INTO “user_blog_post (title, body, user_id) VALUES (?, ?, ?) ('second', 'second post body', 1)
Querying Table Relationships
Now that we have an understanding of how to define new table relationships, let’s quickly dive into an exercise that will demonstrate how to query across multiple related tables. Suppose we want to email all users who have written a blog post which contain the words “machine learning” in the body of their blog posts. We will then send them all an invitation to an upcoming machine learning conference. How do we get this list?
The data we need is
user table, but the data we need to filter by is
body in the
user_blog_post table. In order to accomplish this we can use the Query.join() method:
theEmails = user.query.with_entities(user.email).join(user_blog_post).\ filter(user_blog_post.body.like('%machine learning%')).all()
Interestingly, SQLAlchemy ORM implicitly knows how to join the
user_blog_post tables, because there is only one ForeignKey between them. In a more complicated situation, that would not be the case and we would explicitly describe the ForeignKey relationship:
theEmails = user.query.with_entities(user.email).join(user_blog_post, user.id == user_blog_post.user_id).filter(user_blog_post.body.like('%machine learning%')).all()