In this article, we will meet Bob, the proud owner of a promising retail store, and see his journey that led him to discover the Relational Databases π―! Let's begin.
This article is part of my Data Analytics Made Simpler series. On its website, I'm giving more details about what exactly this series is, and what's my motivation to do something like that, and how I plan to release its content!
Table of Contents:
- Meet Bob! Our store owner
- Bob's first attempt
- How Bob improved his data model
- The introduction of Keys
- Relationships Cardinality
- Enter: RDBMS
- Conclusion
Meet Bob! Our store owner:
Bob has taken responsibility for his family business, a retail store in his home town! He is very creative and business-oriented; he managed to expand his customer base threefold over the last year alone! In order to retain his current customers and attract new ones, he had the idea of implementing a "Loyality Program", where he awards his "loyal" customers with discounts based on their number of store visits and the amount they spent.
The problem is, he doesn't have any database knowledge, and he thought that he could pull this off using a spreadsheet application that he is very proficient with.
Bob's first attempt:
As Bob is a spreadsheet guru, he created one with the following columns:
- Date
- Customer Name
- Customer Phone
- Customer Email
- First Purchase
- Item Description
- Quantity
- Unit Price
- Total Price
From his point of view, this spreadsheet captures everything he needs! The date of the purchase, the customer data, the purchased items, and the amount spent by the customers.
It worked fine for a couple of weeks until he was faced with some challenges that made it very hard to continue this way:
- It was very daunting to enter the same date and the customer's data with each item he/she purchases, knowing that one customer is more likely to purchase several items at once.
- Customers started complaining as it takes a long time for their entire purchase to be registered, especially if there are a lot of purchased items.
- He noticed that the data for the same customer might be entered differently for separate purchases on different days (or even on the same day!). It may be due to typos; the data is entirely different or completely missing.
- Similar to the customer data, the Items data might suffer from similar inconsistencies, like description changes or price changes, which make something like tracking the performance of a single item hard to track.
- As he kept entering the same customer and item data over and over again, the spreadsheet file's size exploded very quickly!Β
How Bob improved his data model:
Bob has realized that his current data model isn't scalable! Like when the data has accumulated for a couple of weeks, it becomes difficult to maintain and get accurate insights from it.
The first thing he really wanted to change was entering the same customer data with each purchased item. As this is redundant and makes modifying any piece of data, like the customer's phone, very challenging, as he must check all the previous purchases for this particular customer and change the phone number in each of his/her related records!
He thought of separating the Customers in their own "worksheet" (table) and renaming the original worksheet to Purchases and removing all the customer data columns from it, leaving only the "Customer Name". He first sketched it out as follows before he performed the actual change to his current file:
Bob has unknowingly created his own version of an "Entity Relationship Diagram" (ERD) which still missing some key ERD features but is close enough!
What he did is that he has created two Entities, "Customers" and "Purchases", and he has used the "Customer Name" as the Link AKA "relationship" between the two entities.
Now his model has become more maintainable and far less in size as if he needed to do the customer phone number modification we discussed earlier, he only needed to do it once in one record in the "Customers" table. Also, as he now doesn't include the full customer's data at each purchase record and only include the Customer Name, the size of his workbook has become substantially smaller.
But soon he encountered another problem!
He noticed, that some customers might share the same name! So this will become problimatic in his "Loyality Program" implementation as for such cases, he wouldn't be able to tell wich is the correct customer to award the discounts to!
The introduction of keys:
So, he realized that seting the "Customer Name" as the "Key" to identify the relationship between the "Customers" and "Purchases" entities might not be ideal as it might not be Unique. He inspected the columns in the Customers table to identify another column that is 100% unique for each customer, he found some good candidates like phone number or email address. He thought that those should be unique for every customer! As one phone number can be owned by one customer and the same goes for the email address, right?
Then he discoverd that if he used those columns, he might introduce further problems down the road! Customers are allowed to change their phone numbers and emails. So, if he had used the phone number for example as the identifying field for the customers and later customers changed their numbers, he then would have different purchase records with diffirent phone numbers that could be expired and not refering to any customers in his "Customers" table!
He decided to introduce an new column to his "Customers" table to act as the unique ID for every customer he has. He though that this column should be unique and can't be empty. He wanted it to be simple, so he chose it to be an incrementing number starting at 1. Then instead of the "Customer Name" to link the two table, he would use this newly introduced "Key" column. He updated his model diagram adding a "Purchase ID" key column as he felt he could use it later:
Again without knowing, Bob has created something called a "Primary Key" by introducing the "Customer ID" column in the "Customers" table. The main function of a "Primary Key" in a table is to be able to uniquely identify each record in this table. Also, by using the same column in the "Purchases" table to define the relationship between the "Customers" table, he used it this time as a "Foreign Key" (foreign to the table). In contrast to the "Primary Keys" that must be unique in a table, "Foreign Keys" can be non-unique in the same table.
In a real ERD, "Primary Keys" (PK) and "Foreign Keys" (FK) should be annotated on the diagram and the "relationship" should be named, conventionally with a "verb". Also, something called relationship "cardinality" must be annotated too on the line related to the relationship.
This way, Bob managed to enhanced his data model a lot. All he has to do to register a new purchase, is to lookup the customer's record, retreive its PK (Primary Key), and write it in front of each purchased item.
But he felt that there is more to be done, so he invited his friend Ben, a database designer, for checking what Bob has done so far with his model.
Relationships Cardinality:
Ben has really admired what Bob has done so far without any prior database knowledge! He saw the diagram that Bob has made and told him that it looked pretty much like a real ERD and explained how he could make it so. He also explained the concept of "Primary Keys" and "Foreing Keys" that Bob has used unknowingly.
He felt that he needed to explain to Bob, the concept of "cardinality" of the relationships between the entities as he saw room for improvements for the current data model.
Relationship Cardinalty is a fairly simple concept. In general, it defines how the records from one table are related to the records from the other table at both ends of a relationship. It can be "One-to-One", "One-to-Many", or "Many-to-Many".
For "One-to-One", it means that one record for the first table is linked to only one record from the second table. For example, if we split the "Customers" table into two table, with "Name" and "Customer ID" in one table and "Customer ID" and the rest of the columns in the other table. In that case, each record from any table will be linked to one and only one record from the other table using the "Customer ID". "One-to-One" has its application in databases but it isn't commonly used.
Similary for "One-to-Many", it indicates that one record for the "One" side of the relationship can be linked to many records from the "Many" side. The relationship between the "Customers" and the "Purchases" tables is "One-to-many" as "One" customer can perform "Many" purchases but one purchase can't be done by many customers.
"Many-to-Many" relationships might need some time to grasp but they aren't hard to understand! It means that "Many" records from the first table can be linked with many records from the second table. Think of "Students" and "Classes" tables. One Student can register to "Many" classes and one class can contain "Many" students.
Ben has recommended to Bob that he could seperate the "Items" entity similar to what he has done to the "Customers" entity (table). He also identified the cardinality between "Customers" and "Items" as "Many-to-Many" as one customer can purchase "Many" items and one Item can be purchased by "Many" customers.
Finally, he modified Bob's diagram to show PKs, FKs, and relationships cardinalities. He separated "Items" into their own table and used the "Purchases" table as an intermediate table to implement the "Many-to-Many" relationship between "Customers" and "Items". The final ERD became as follows:
Again, this is a real ERD but in its simpleset form. ERDs aren't in the scope of this post and I might discuss them in later posts.
Enter: RDBMS
Bob has thanked Ben for his advice! And indeed acknowledged that the overall size of the workbook file has become smaller and the whole model has become more maintainable than before those changes. But he rememebered that he still uses spreadsheets app to manage his data! He realized that there is a lot of manual work to be done like looking up the IDs of customers and items, ensuring that the IDs in the "Customers" and "Items" tables are unique and present, the "quantity" in the "Purchases" table is positive and present, and so much more "rules" that he must enforce at data entry time! Also, he noticed that the "Total Price" column is not straight forward to calculate as before he splitted the "Items" from "Purchases" and the "Loyality Program" analysis now requires a lot more work!
He told his friend Ben all of his concerns. Ben smiled and told Bob, "Why don't you use a Relational Database Management System (RDBMS)?"
Ben continued to list all the benifits of using a dedicated RDBMS software like MySQL, Postgresql, MariaDB, or many others over the use of spreadsheets app:
- With RDBMSs, you won't manage any files. The RDBMS will do it for you.
- The "rules" that you want to enforce are called in the RDBMS world, "Constraints" and the RDBMS can enforce them for you. For example, for a "Primary Key" column you can enforce the
UNIQUE
andNOT NULL
constaints. "Foreign Keys" are a from of constraint too. RDBMSs check the existance of the key used as a foreing key in a table as a primary keyin the other table in a "One-to-Many" relationship. - Some RDBMS has the "Calculated Columns" feature and you can calculate the "Total Price" with relative ease like in a spreadsheet application.
- You won't have to "join" the tables manualy like in a spreadsheet appplication as RDBMSs uses SQL (Structured Query Language) and "joining" all the tables can be joined with a statement as simple as:
SELECT
p.Date,
c.Name,
c.Phone,
c.Email,
c.FirstPurchase,
i.Description,
i.UnitPrice,
p.TotalPrice
FROM Purchases AS p
JOIN Customers AS c
ON p.CustomerID = c.CustomerID
JOIN Items AS i
ON i.ItemID = p.ItemID
This may look intemidating if you don't know SQL but trust me it is very simple and easy to understand! π
- Similar to joining tables together, you can have your "Loyality Program" analysis using a stored SQL Query with just a click of a button!
- You can host your RDBMS on a dedicated server and build an application on top of it leveraging SQL for more user friendly environment while entring the purchases data.
That and many other benefits!
Conclusion:
Bob was really happy with his discovery of Relational Databases and RDBMSs! He even hired a developer to build his purchases application and integrating his inventory systems with it!
He managed to keep his current customers happy and win more customers! He expanded his familly business into other districts and his store became a well known franchise! π
The bottom line, Relational Database makes handling data a very easy job! SQL is used by RDBMSs and it is a very powerfull tool to query and analyse your data. The model we have explored today is call Online Transactional Processing (OLTP) as it is optimized for "writes". Meaning, you can write data very quickly but reading it will take time as you will probably need to join several table to get a meaningfull analysis. And it is called "Transctional" as it supports transactions that can be one or more operation that are done together or not done at all. More on OLTP and ACID compliance (that describes transactions) in later posts. See you then π
Top comments (0)