Databases are at the heart of every web application. Their design, or schema, is literally the blueprint for how all information is stored, updated, and accessed. However learning about databases is difficult. There are many long, complicated books on database theory, but few simple explanations for beginners.
This is a tutorial on database design for beginners. It is the database primer I wished existed when I was starting out.
What is a Database?
A database is a program that allows for the storage and retrieval of data. Broadly speaking there are two types of databases: relational and non-relational.
Relational databases, which are far more common, consist of tables which are structured in columns and rows similar to an Excel spreadsheet. Popular database options including MySQL, PostgreSQL, Microsoft Access, Microsoft SQL, and Oracle are relational. They use SQL (Structured Query Language), which is a programming language just for managing data in a relational database.
Non-relational databases, by contrast, do not have a strict column and row schema. They are viewed by proponents as “more flexible” than their relational counterparts and are increasingly used by organizations like Facebook, Google, and the NSA for big data situations.
This tutorial will cover relational databases. If you’re new to databases, this is the easiest place to start.
Tables and primary keys
A database is made up of (often multiple) tables. A simple table, let’s call it customers, would look like this:
The columns are “customer_id”, “username”, “email”, “password,” and “birthday.” Each row is known as a record. We could eventually add hundreds or even millions of records as our number of users grew, but the defined columns would stay the same.
The “id” column is what’s known as a primary key: a unique and non-null number that refers to each record. Every record/row in a relational table has a primary key.
To understand why, let’s imagine that the user “william” wants to change his “username” to “bill.” How do we know which “password” and “birthday” to associate with this user? Without a primary key, we don't.
Tables also become much, much more complicated over time. A primary key gives us a unique way to identify and manage each record.
Data Redundancy and Input Constraints
A key tenant of good database design is that each data item, for example a username, should only be stored once, in one location. This avoids have to maintain and update multiple locations, which is very error-prone.
A second tenant is to use input constraints to increase the reliability of your data. For example, in our “email” column we know that a proper email record should contain the “@“ symbol, so we can say that only data with the “@“ symbol can be entered in the email column. And in the birthday column, we want a date datatype here so only, say, birthdays entered with numbers will be accepted.
Linking tables with Foreign Keys
When designing a database, the key decisions are what data do you want to store and what relationship exists between them. Right now we just have a customers table, but let’s say we also want to store orders because we’re building an e-commerce website.
It’s tempting to think we can just add a column for orders to our existing table. But an order has multiple pieces of information we want to track. For example, at a minimum we’d want to know:
- the order_id (primary key) so we can keep track of each order
- the date and time the order was placed
- the customer the order was placed by
This means we need a separate table for Orders. In order to associate a specific order with a specific customer, we can use a foreign key relationship through the customer_id field to link the two tables.
If we look at the Orders table schema we can see the first order was placed by customer_id 2, “john” in the Customers table. The next order, order_id 2, was made by "william." And so on. Foreign keys let us link multiple database tables together.
One-to-many, one-to-one, and many-to-many relationships
There are three types of foreign key relationships. The first, one-to-many, is what we have with the customers and orders tables. One customer can place multiple orders. By using the customer primary key as our foreign key in the orders table, we can keep track of this.
An example of a one-to-one relationship would be a database tracking people and passports. Each person can only have one passport, and vice versa, so those two tables would have a one-to-one relationship.
The third option is known as a many-to-many relationship. Let's imagine we have a database tracking authors and books. Each author can write multiple books and each book can have multiple authors. By defining a many-to-many relationship via foreign keys, we can link these two tables together.
Database Normalization
Designing a database is both an art and a science. As needs change over time, a database will undergo frequent updates to its table and column layout. Unfortunately redundant data can often sneak in, which causes multiple problems:
- inefficient - the database engine will need to process more data for each query or update
- bloated - storage requirements increase due to redundant data
- errors - redundant data must be manually inputted, which is error-prone
Normalization is the formal term for the process of eliminating redundant data from database tables. There are 5 normal forms but most database designers stop at levels 2 or 3 because while normalization reduces data dependency, it comes with added complexity that decreases performance.
Next Steps
Now that you've learned the basics of database design, why not try building a few database-driven website yourself with Django?
Top comments (12)
Not to nitpick...but lots of people have more than one passport. There are countries that issue you multiple passports to allow you to use one to go to country A, and another to go to country B, which won't let you in with a stamp from country A in your passport.
Also the more common case where a person has passports from multiple countries (eg citizen of multiple countries).
Also, technically, some counties allow infants to travel on a parent's passport (or they used to), although I don't think that data would be stored anywhere, I'd only seen it written in by hand by the passport owner.
^^^^^^^^^^^^
This really shows how hard databases/user data can be, and that many assumptions (eg name/date of birth format/nature) have a lot of exceptions!
True, I appreciate the perspective as this is a blindspot for me as an American where "most" people have a single passport. Not that we can fly anywhere these days anyway :)
Ha. From what I understand, most Americans don't have a single passport. Nope, we can't, but neither can anyone where I come from either...they're trying to keep their zero cases zero.
Great article! Not only great for django, also it's beneficial to everyone in general.
Thanks!
Very insightful article. Though I was looking forward to the example of the one to one and many to many relationships, but the article ended without this.
Overall, this is a well simplified article on database designs
Hello Vincent,
Thanks for your content. Can we talk in private?
Keep up the good writing :)
You're welcome to email me at will@learndjango.com
Thank you! :)
Hi William! I just read the free chapters of your Django book and was instantly hooked! I bought it!
Awesome! If you have any feedback as you go through, feel free to email me at will@learndjango.com.
Super useful information, very concise and to the point. Thanks!