Let's face it, building a database that works is not a challenge at all. The real challenge is to build a database that is fast, reliable, scalable, and well-engineered. In this article, let me show you some of the most important tips when building relational databases like MySQL, PostgreSQL, etc.
Consider a database table (customers) with the following header/columns:
| id | state | points |
// 10 million rows go here...
Here's how to query the database to get customers in 'NYC':
SELECT id FROM customers WHERE state = 'NYC';
The query will be slow because we've got 10 million rows and the DB engine needs to query all rows to get the relevant data. With indexing we can eliminate all irrelevant rows before querying the data so only rows with 'NYC' will be checked when 'NYC' is queried. Here's the same query with index:
CREATE INDEX idx_state ON customers (state); -- convention is to prefix index with "idx" followed by underscore and then column name. SELECT id FROM customers WHERE state = 'NYC';
This query will be so much faster because only rows with 'NYC' will be queried, not all rows, thanks to indexing. Think of indexing as educating the database engine on which data is where and how it is stored so that when you query data next time, it gives the relevant data more efficiently without wasting resources/time.
I can talk so much about indexing but I don't want to bore you. If you want to learn more about indexing, Google these: "multiple-column indexing", "indexing and sort operation", "composite indexes", etc.
You need to only store that data that you need or the business needs. Don't try to solve a problem which might or might not happen in future. If you need to store additional data, you can always create a new table and add relationship(s).
Blob stands for Binary Large Object and this basically means images, audios, and other large files converted to binary format. If I could impose a rule in the tech world, I'd forbid blob as a data in database tables, because it's incredibly bad to do so. Blobs increase the size of database like crazy and this impacts the performance.
If a table has way too much columns, it never means the table is well-engineered. If that's the case, consider splitting your table to two or more tables. In contrast, if there are a whole lot of tables with very few columns (say 1 to 2) and they're connected with relationships, consider combining them (and be okay with duplicating some data). A lot of beginners think that denormalizing data is something bad because it has the prefix "de" but I can assure you it's not true, lol.
Just because a data type works well with your use case, it doesn't mean it's the best one. For example, to store age of users, use TINYINT data type (an unsigned TINYINT stores values from 0 to 255) instead of INT. INT data type stores up to 2 billion (or something like that). This takes more space and we don't need it at all. The age of users will never go beyond 255. If anyone does reach 254 years old, we still have 1 year to change the data type.
Just kidding, or am I? Just kidding... or am I?