DEV Community

Cover image for A complete guide to generated columns in MySQL
Antonello Zanini for Writech

Posted on • Updated on

A complete guide to generated columns in MySQL

Generated columns allow you to store automatically generated data in a table without using the INSERT and UPDATE clauses. This useful feature has been part of the MySQL language since the 5.7 version, and it represents an alternative approach to triggers when it comes to generating data. Also, generated columns can help you make your query easier and more efficient.

Let's now learn everything you need to know to master generated columns in MySQL.

What is a MySQL generated column?

A generated column is similar to a normal column, but you cannot change its value manually. This is because an expression defines how to generate the value of a generated column based on the other values read from the other columns of the same row. So, a generated column works within the domain of a table, and its definition cannot involve JOIN statements.

In other words, you can think of a generated column as a sort of view but limited to columns. Notice that generated columns are different from SQL triggers, and you can define them only in CREATE TABLE or ALTER TABLE statements with the syntax below:

generate_column_ame column_type [GENERATED ALWAYS] AS (generation_expression)
[VIRTUAL | STORED] [NOT NULL | NULL]
[UNIQUE [KEY]] [[PRIMARY] KEY]
[COMMENT 'string']
Enter fullscreen mode Exit fullscreen mode

The AS (generated_column_expression) clause specifies that the column you are adding or updating to a table is a generated column. The generation_expression defines the expression that MySQL will use to compute the column values, and it cannot reference another generated column or anything other than the columns of the current table. Also, notice that the generation expression can only involve immutable functions. For example, you cannot use the CURDATE() function that returns the current date in a generated column expression definition because it is a mutable function.

You can also precede AS with the GENERATED ALWAYS keywords to make the generated nature of the column more explicit, but this is optional. Then, you can indicate whether the type of the generated column is VIRTUAL or STORED. You will learn the difference between the two types in the chapter below. By default, if not explicitly specified in the query, MySQL marks a generated column as VIRTUAL.

Let's now see the generated column syntax in action in a CREATE TABLE query:

CREATE TABLE users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(100) NOT NULL,
    first_name VARCHAR(60) NOT NULL,
    last_name VARCHAR(60) NOT NULL,
    full_name varchar(120) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name))
);
Enter fullscreen mode Exit fullscreen mode

In this example, the full_name column will automatically store the concatenation of the first_name and last_name columns.

What types of generated columns exist in MySQL?

As explained earlier, you can define a generated column as VIRTUAL or STORED.  Let's now take a closer look at these two types.

Virtual generated columns

MySQL does not store a generated column marked as VIRTUAL. This means that MySQL evaluates its value on the fly when required, immediately after any BEFORE triggers. In other terms, a virtual generated takes no storage space.

Stored generated columns

MySQL stores any generated column marked as STORED. This means that MySQL takes care of evaluating its value and storing it on the disk every time you insert or update a row. In other terms, a stored column requires storage space as if it were a normal column.

Virtual generated columns vs Stored generated columns

Let's now learn more about the pros and cons of virtual and stored generated columns.

Virtual generated columns

Pros

  • Their creation is instantaneous because MySQL only has to change the table metadata

  • They take no disk space

  • INSERT and UPDATE come with no overhead because MySQL does not need to generate them 

Cons

  • MySQL has to evaluate them when reading a table, making SELECT queries involving them slower

Stored generated columns

Pros

  • MySQL can read them as if they were normal columns, which assures a fast retrieval with no overhead

Cons

  • When added to a new table, MySQL has to rebuild the entire table

  • INSERT or UPDATE comes with an overhead because MySQL has to generate them

  • They take disk space

Also, notice that you can mix VIRTUAL and STORED columns within a table, and they both support MySQL indexes and secondary indexes. However, as explained in the official documentation, secondary indexes on virtual columns take up less space and memory compared to stored generated columns. So, virtual generated columns are more efficient when it comes to secondary indexes. 

Why adopt generated columns?

There are several reasons to adopt generated columns, but the three below are the most important ones.

  • They provide you with cache capabilities to make your queries faster: generated columns give you the possibility to define columns containing useful data you can then efficiently retrieve whenever you need.

  • They allow you to simplify query expressions: instead of making your queries complex, you can spread the complexity over the generated columns and then use them in simple filter operations.

  • They empower you to define functional indexes: MySQL implements functional indexes as hidden virtual generated columns. In other terms, generated columns give you the possibility to define efficient and advanced indexes involving MySQL functions.

MySQL generated columns in action in real-world examples

Let's now see generated columns in action in some examples coming from my experience as a backend developer collaborating with data-driven startups.  

Using a generated column to concatenate columns for consistency reasons

When developing the frontend of your application, you may notice certain patterns in data representation. For example, in sports, players on a team are generally identified with the following format:

`first_name [first_carachter_middle_name.] last_name [(jersey_number)]`
Enter fullscreen mode Exit fullscreen mode

As you can see, you can easily generate this data field by aggregating the four columns with the following generated column:

string_identifier VARCHAR(255) GENERATED ALWAYS AS (CONCAT(first_name, ' ', IF(middle_name IS NULL, '', CONCAT(LEFT(middle_name, 1), '. ')), last_name, IF(jersey_number IS NULL, '', CONCAT(' (', jersey_number, ')')))) STORED
Enter fullscreen mode Exit fullscreen mode

This would produce:

Cristiano Ronaldo (7)
Lionel A. Messi (10)
Enter fullscreen mode Exit fullscreen mode

In this case, a generated column empowers you to standardize the data filed format directly at the database level. In addition, a stored generated column avoids the inevitable overhead of constructing this field every time you need it.

Employing a generated column to automatically generate hashed IDs

Typically, you use the IDs of your resources in the URL of your website or REST APIs to retrieve the data you need. But publicly exposing your IDs could pose a security problem. Especially, when using autoincremental IDs, which are easy to predict and make scraping or bot attacks easier.

To avoid this, you can think about hiding your original IDs through the use of automatically generated, random, more secure public IDs. You can achieve this with a virtual generated column by hashing your IDs as follows:

public_id VARCHAR(40) GENERATED ALWAYS AS SHA1(CONCAT("PLAYER", id)) VIRTUAL
Enter fullscreen mode Exit fullscreen mode

Notice that to avoid generating known hash values, you can concatenate your ID with a special keyword. Learn more about MySQL encryption and compression functions here.

Defining a generated column to simplify data filtering

When filtering data, some columns are more useful than others. Also, it is often necessary to change the representation of the data to make filtering simpler or more intuitive. Instead of doing this in every filter query, you can define a helpful generated column storing the information you need to perform the filtering, in the desired format. 

For example, you could define a generated column to more easily find the players on a basketball team as follows:

filter_string VARCHAR(255) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name, IF(birthdate IS NULL, '', CONCAT(' ', DATE_FORMAT(birthdate, "%m-%d-%Y")))) STORED
Enter fullscreen mode Exit fullscreen mode

Such a column would produce:

LeBron James 12-30-1984
Stephen Curry 03-14-1988
Enter fullscreen mode Exit fullscreen mode

This contains data useful for filtering and with the player's birthdate in the US format.

Generated columns vs. Triggers

As explained previously, you can only use generated columns within a table. Also, they can only involve immutable functions, and MySQL generates their value in response to an INSERT or UPDATE query.

On the other hand, a trigger is a stored program MySQL automatically executes whenever an INSERT, UPDATE or DELETE event associated with a particular table occurs. In other terms, triggers can involve several tables and all MySQL functions. This makes them a complete solution compared to generated columns. At the same time, MySQL triggers are inherently more complex to use and define and also slower than generated columns. 

Final Thoughts

MySQL generated columns are undoubtedly a great tool that can serve many uses and bring numerous benefits. Adopting them is easy and effective, although not many database developers use them. Either way, knowing how to take advantage of them may become essential to making your backend application faster, cleaner, and less complex. Explaining what they are and how to use them through real-world examples was why I wrote this article!


The post "A complete guide to generated columns in MySQL" appeared first on Writech.

Top comments (0)