Header generated with Microsoft Bing's Image Creator (AI).
Hello, coders! 💻
Intro
I'd like to talk about SQL and how we can have fun with the many functions it has to offer.
Today we'll have a look at how we can leverage JSON functions to help us build queries.
What we want to retrieve from our database
Let's say we log actions our users perform on some app. Actions can be start
or stop
for simplicity's sake.
You're tasked to craft a query to retrieve ID of users whose latest actions is start
.
Database structure and data
I'm using MySQL for this tutorial and the table we'll be working with is as follows:
CREATE TABLE actions (
action_name varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
user_id tinyint NOT NULL,
created_at datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
And some data to play with:
INSERT INTO actions (action_name, user_id, created_at) VALUES
('start', '1', '2023-11-24 10:48:00'),
('stop', '1', '2023-11-24 10:49:00'),
('stop', '1', '2023-11-24 10:49:00'),
('start', '1', '2023-11-24 10:50:00'),
('stop', '1', '2023-11-24 10:51:00'),
('start', '2', '2023-11-24 10:52:00'),
('stop', '2', '2023-11-24 10:53:00'),
('start', '2', '2023-11-24 10:54:00'),
('stop', '2', '2023-11-24 10:55:00'),
('start', '2', '2023-11-24 10:56:00'),
('start', '3', '2023-11-24 10:48:00'),
('stop', '3', '2023-11-24 10:49:00'),
('start', '4', '2023-11-24 10:50:00')
;
Expectations
Based on the above data users with ID 2 and 4 are the ones we're looking for.
Challenge / rule
I tried to do that in a single query, no joins or sub queries. Why? Ya know, for fun and as learning experience!
Query build up
GROUP_CONCAT immediately came to mind. It allows us to get a list of actions, grouped by user, ordered by created date in descending order so the latest action is the first from left to right.
SELECT user_id, GROUP_CONCAT(action_name ORDER BY created_at DESC) as list_of_actions
FROM actions
GROUP BY user_id;
Which results in:
| user_id | list_of_actions |
| ------- | --------------------------- |
| 1 | stop,start,stop,stop,start |
| 2 | start,stop,start,stop,start |
| 3 | stop,start |
| 4 | start |
We can easily see that users with ID 2 and 4 are the ones we want but how can we check that the leftmost value is start
?
How about we convert list_of_actions
into an array, grab the first element and make sure it is start
?
Sounds fun, let's do this.
What does a JSON array look like?
It's a list of comma separated values, each surrounded by double quotes.
Example: ["first_value","second_value"]
Values, double quotes and commas
Let's start by surrounding each value with double quotes.
GROUP_CONCAT
's default separator is ,
but we can pass any string we want.
What would we get if we joined with ","
instead?
SELECT user_id, GROUP_CONCAT(action_name ORDER BY created_at DESC SEPARATOR '","') as list_of_actions
FROM actions
GROUP BY user_id;
which gives us:
| user_id | list_of_actions |
| ------- | ----------------------------------- |
| 1 | stop","start","stop","stop","start |
| 2 | start","stop","start","stop","start |
| 3 | stop","start |
| 4 | start |
Almost! We're just missing a "
both at the start and end of list_of_actions
.
Let's fix this and, while we're at it, let's also add [
at the start and ]
at the end using CONCAT!
SELECT user_id, CONCAT('["', GROUP_CONCAT(action_name ORDER BY created_at DESC SEPARATOR '","'), '"]') as list_of_actions
FROM actions
GROUP BY user_id;
which results in:
| user_id | list_of_actions |
| ------- | --------------------------------------- |
| 1 | ["stop","start","stop","stop","start"] |
| 2 | ["start","stop","start","stop","start"] |
| 3 | ["stop","start"] |
| 4 | ["start"] |
Looking good!
list_of_actions
is now a valid JSON string.
Getting the first element of each array
Now we just need to tell MySQL to consider list_of_actions
as JSON and then take the first element.
Converting is done with CAST(value) AS JSON
.
And to extract the first value of an array we use JSON_EXTRACT.
Our query is now:
SELECT user_id, JSON_EXTRACT( CAST( CONCAT( '["', GROUP_CONCAT( action_name ORDER BY created_at DESC SEPARATOR '","' ), '"]' ) AS JSON ), '$[0]' ) as latest_action
FROM actions
GROUP BY user_id;
which results in:
| user_id | latest_action |
| ------- | ------------- |
| 1 | "stop" |
| 2 | "start" |
| 3 | "stop" |
| 4 | "start" |
Excellent!
Final query
Since we said we just wanted users whose latest action is start
then we can leave user_id
in the SELECT
clause and move our JSON manipulating functions to an HAVING
clause:
SELECT user_id
FROM actions
GROUP BY user_id
HAVING JSON_EXTRACT( CAST( CONCAT( '["', GROUP_CONCAT( action_name ORDER BY created_at DESC SEPARATOR '","' ), '"]' ) AS JSON ), '$[0]' ) = 'start';
and boom:
| user_id |
| ------- |
| 2 |
| 4 |
Thank you very much indeed.
Conclusion
I hope you've learned a thing or two while using functions such as GROUP_CONCAT
, CONCAT
, CAST
or JSON_EXTRACT
.
You probably have better alternatives so please don't hesitate to share your methods! That's how we get better.
Would you be interested in other articles where we experiment more with JSON (or other) functions? Let me know in the comments.
Till next time.
Happy coding! ⌨️
Top comments (4)
That is interesting. You can also store the action name and created at date in an array for each user. The data is denormalized. You can use the NoSQL Document Store or just the JSON data type.
Hi, thanks for your comment.
You mean to have an array of
action, date
tuples?Can you share an example query which would achieve the requirements? Thanks!
I had tried your code, and its a simple concept well explained.
And the denormalized data can be like this:
Right.
In another table I have stored denormalized in an object with dates as keys.
Transposed to our actions example here we'd have an
actions
column like so:It's a topic for another article in the making!