DEV Community

Cover image for MySQL Schema Design: Back to The Future?
Arctype Team for Arctype

Posted on • Updated on • Originally published at arctype.com

MySQL Schema Design: Back to The Future?

Back to the Schema

Preface

​Designing a MySQL database schema is an inevitable part of the career of every MySQL database administrator or even a developer. Developers and MySQL database administrators usually turn to the schema design side of MySQL to improve query performance, normalize their databases, add or drop certain indexes, modify columns, also for other things.

Generally, what one database administrator considers to be a "good" database design might not seem so perfect for another DBA. Everyone has their own set of preferences—depending on experience and other things—but in general, even while MySQL does move forward at a rapid pace, with improvements being made left and right, MySQL schema design practices do not change much.

We are going to start from some of the basic things that you personally may already know but might not be as readily apparent to other developers.

The Basics of MySQL Schema Design

​Some of the general advice regarding MySQL schema design sounds something like this:

  1. Make sure MySQL databases only store data that is needed.
  2. Choose optimal data types for certain use cases.
  3. If we store variable character values in columns, we use VARCHAR. If we store text, we should use TINYTEXT, TEXT, MEDIUMTEXT , or LONGTEXT. If we find ourselves using integers, we should probably look into TINYINT, SMALLINT, MEDIUMINT, INT, or BIGINT depending on a specific use case.
  4. We should give our data types a specific length—for example, we should avoid using VARCHAR(255) when VARCHAR(100) would suffice, etc. By doing so, we enable MySQL to reduce data size.
  5. If we have a lot of data, we should use indexes to improve performance, but we must also keep in mind that we should avoid using redundant and (or) duplicate indexes.

The general consensus among DBAs is that MySQL schema design should be based around a few core points:

  1. Define the requirements for the database.
  2. Gather, organize and refine the data. If needed, add indexes to improve performance.
  3. Create relationships among tables.
  4. Refine and normalize the database design. ​

Defining the Requirements for Database Design

​To properly define the requirements for a database design, we should answer these questions:

What Kind of Data Are We Storing?

​An answer to this question will help decide what kind of data types our database will need. Each data type consumes a different amount of storage space, so as the data grows, the importance of the answer to this question will probably become more and more apparent.

Do We deal with Big Data Sets?

​An answer to this question will determine how best to approach MySQL. It will decide what kind of storage engine we should use, what flush method should be used, etc. If you want to learn more about how big data interacts with MySQL, consider having a look at an earlier blog post.

What Kind of MySQL Queries Do We Run?

​An answer to this question will determine whether we need to use indexes or not, what kind of indexes do we need to use, and when. It will determine what kind of storage engine we should use, and if we run a web application together with MySQL, it might even impact the process of web application development.

What Kind of Queries Are Run Most Frequently?

An answer to this question should determine whether it's appropriate to use indexes for our use case (remember, indexes slow down INSERT performance while speeding up SELECT queries), it will determine whether we need to normalize (or denormalize) your database schemas, it might determine how we will further develop your applications that interact with MySQL, etc.

Do We Need Indexes?

An answer to this question will heavily depend on what kind of queries our database is running. If we deal with many SELECT queries but avoid INSERT queries, indexing may be great. In other cases, indexes might not be feasible.

What Kind of Indexes Do We Need?

​While answering this question, we must remember that MySQL has different kinds of indexes that might be suitable for different purposes: we can choose from B-Tree indexes, hash indexes, spatial indexes, also FULLTEXT indexes. In this case, it is important to remember that indexes are mostly used to enable rapid navigation to a specific position in a table by the database server. Indexes reduce the amount of data our server has to examine, and they help our database server avoid sorting and temporary tables while also turn random I/O into sequential I/O. You can also refer to this comprehensive index guide.

Answer these questions and keep the basics of MySQL schema design in mind at all times—knowing them should put you (and your database) on a good path. However, if we want to push our MySQL instances to the next level, we should probably look into how to go about designing MySQL schemas for performance too.

Designing MySQL Schemas for Performance

​To design our MySQL database schema for performance, we should keep the following things in mind:

  1. Database schemas are not just about columns and indexes— in a nutshell, a database schema is the way a given database is organized or structured.
  2. When designing MySQL schemas, avoid using reserved keywords— doing so might cause problems in the future when we elect to, say, run SELECT queries. While our queries might be correct, they might still return a syntax error because we named a column after a reserved keyword.
  3. Make sure to keep an eye on the data types of your columns— improperly set data types of even one column can be the cause of why our MySQL instances might face problems in the future. We should use VARCHAR if we want to store a few words (or sentences) inside your columns, we use TEXT if we want to store longer amounts of text (in this case, keep in mind that VARCHAR data types can be a part of an index while TEXT data types cannot (fully) be a part of an index, you would need to specify an index length). We use CHAR if we aim to store very short string values. If we're storing numbers, we can use either TINYINT, SMALLINT, MEDIUMINT, INT, or BIGINT (we can use DECIMAL if we have numbers that are so large that they don't even fit in BIGINT)
  4. Keep the data types simple— simpler data types usually require fewer CPU cycles to process.
  5. Use shorter column lengths if the need for storing longer values isn't there— the larger our columns are, the more memory they will use.
  6. Don't play with indexes— if we plan to use indexes, we should only use the indexes we need, avoid indexing the same column twice, also learn about how our MySQL indexes can impact performance.
  7. Avoid storing passwords in plain text— this one should come as a no-brainer. We should not store passwords in plain text; doing so is a terrible security practice. We should avoid using weak hashing algorithms (like MD5) too—preferably, use BCrypt or Blowfish.
  8. Consider normalizing database schemas— in a normalized database schema, each fact is only represented once. Normalizing a MySQL schema is very important to reduce data redundancy and improve data integrity. This step can be crucial if we think that we might find ourselves dealing with growing data sets in MySQL—a well-designed database schema can allow our data sets to grow exponentially.
  9. Keep convenience and usefulness in mind— it is probably wise to design our database schema in such a way that allows users who use it to access the data in useful and convenient ways.
  10. Designing MySQL schemas for performance by analyzing use cases— if we want to take this one step further, we can even consider analyzing user behavior and designing the database schemas in such a way that adheres to user behavior while avoiding compromising on performance at the same time.

Designing MySQL Schemas for Performance: Gotchas

Once we have taken the advice above into account, we should also keep in mind that database schemas, broadly speaking, can be divided into two categories: physical and logical. A physical database schema design refers to how the data in the database is stored physically, while logical database schema design refers to all of the data-related concerns in the database.

Also, keep in mind the fact that our database schema design plan should also include the use (or the absence thereof) of NULL values—in general, it's harder for MySQL to optimize queries that use columns with these kinds of values because they make value comparisons and indexes more complicated.

Generally, a nullable column will also require more storage space (it requires an extra byte per entry), but everything depends on the data type of the column, whether the column is indexed or not, and also the storage engine.

As there is no simple answer, remember that we should only consider using NULL values in your columns if you want to indicate that the data is missing, unknown, or not applicable. NULL values might also prove to be useful if we plan to run queries like WHERE column_name IS [NOT] NULL etc.

Also, keep in mind that MySQL has a hard limit regarding columns in tables. As per its documentation, MySQL has a limit of 4,096 columns per one table. MySQL tables have a maximum row size limit of 65,535 bytes, no matter if the storage engine we use supports larger rows or not.

A table can contain a maximum of 64 secondary indexes, and the index key prefix length limit is 3,072 bytes for InnoDB tables that use the DYNAMIC or COMPRESSED row format. If our tables use the REDUNDANT or COMPACT row format. However, the index key prefix length limit is 767 bytes.

Our data types should be as simple as possible. In general, to achieve the high performance we need to run InnoDB (that can be done even with big data, covered in my previous blog post), but that's not all.


To avoid another one of the gotchas related to MySQL schemas, we should consider taking a very close look into how we store data inside of our tables. You can find some valuable information regarding data storage requirements for InnoDB tables below:

  • INT and FLOAT requires 4 bytes of storage
  • YEAR requires 1 byte of storage
  • DATE requires 3 bytes of storage
  • TIME requires 3 bytes + fractional seconds storage *TINYBLOB and TINYTEXT requires the defined length in bytes + 1 byte of storage
  • BLOB and TEXT requires the defined length in bytes + 2 bytes of storage
  • MEDIUMBLOB and MEDIUMTEXT requires the defined length in bytes + 3 bytes of storage
  • LONGBLOB and LONGTEXT requires the defined length in bytes + 4 bytes of storage ​

Testing the MySQL Schema Design

​Everything that you built should be tested. That's true in almost every scenario, but even more so in the tech world. Once you have finished designing your MySQL schema, it's time to take it for a spin.

Database schema design can be tested by simply running queries that are specific to your environment and seeing how they respond. Are they as quick as you expect? If not, why? Could you further refine (normalize, add indexes, etc.) your schema? However, if the answer is yes, congratulations! A proper design of your MySQL schema can be one of the key things regarding the performance of your MySQL instances.

MySQL schema design can be easily tested by running queries specific to your environment. Arctype can be of great help in this scenario. Simply log in, define your data sources (define the database that you want to connect to), then run your SQL queries inside of the client and see how they respond:

Query

If you're feeling fancy, you can even define a query variable inside of the query like so (the {{demo_variable}} can have any name):

Demo

Suppose the response time of the queries you elect to use in your application satisfies you—congratulations, your database is in good shape! However, if the query response time does not satisfy you, see if a better schema can be created to improve performance.

Summary

​To summarize, MySQL schema design principles do not change much over time. To ensure that your MySQL schemas in your database instances are optimized for high performance, follow the advice outlined in this blog post, and don't forget to test all of the enhancements you do to your database instances.

About The Author

Lukas is an ethical hacker, a MySQL database administrator, and a frequent conference speaker. Since 2014 Lukas has found and responsibly disclosed security flaws in some of the most visited websites in Lithuania and abroad including advertising, gift-buying, gaming, hosting websites as well as some websites of government institutions. Lukas runs one of the biggest & fastest data breach search engines in the world - BreachDirectory.com and frequently blogs in multiple places educating people about information security and other topics. He also runs his own blog over at lukasvileikis.com

Top comments (0)