On a client project I have to do a search in a structure of tables and return results including some child objects. A classical instance of the N+1 query problem, if I was using an ORM. I decided not to use the ORM for the feature because it will be one of the hottest paths of the application and I wanted more control over the queries (rather complex logic with LIKE
& sorting). But the filtering/sorting is not the topic today, so I will leave it out in the examples.
For illustration, let’s assume the following schema:
CREATE TABLE parents (
parent_id INTEGER PRIMARY KEY,
name VARCHAR
);
CREATE TABLE children (
child_id INTEGER PRIMARY KEY,
name VARCHAR,
birthdate DATE,
parent_id INTEGER,
FOREIGN KEY (parent_id) REFERENCES parents
);
Without any further thinking I would do a query like this and a bit of code that constructs parent objects with their respective children.
SELECT * FROM parents JOIN children USING (parent_id) ORDER BY parent_id;
parent_id | name | child_id | name | birthdate
-----------+-------+----------+---------+------------
1 | Jim | 1 | Tamara | 2017-02-01
1 | Jim | 3 | Tom | 2005-10-01
1 | Jim | 5 | Tonja | 2011-07-17
2 | Jenny | 2 | Tim | 2000-11-02
2 | Jenny | 4 | Theresa | 2017-04-30
Just before I started writing the logic to pull apart the result and put it into it’s object structure, I thought, “It would be nice to let the database put together all the children of one parent into an array.”
I already knewarray_agg
which aggregates all values into an array. After some reading I discoveredjson_build_object
which takes a sequence of keys and values and creates a JSON object from it.
So my final query looked like this:
SELECT
parent_id
, p.name
, array_agg(json_build_object(
'child_id', c.child_id
, 'name', c.name
, 'birthdate', c.birthdate
)) as children
FROM parents p
JOIN children c USING (parent_id)
GROUP BY 1, 2;
parent_id | name | children
-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | Jim | {"{\"child_id\" : 1, \"name\" : \"Tamara\", \"birthdate\" : \"2017-02-01\"}","{\"child_id\" : 3, \"name\" : \"Tom\", \"birthdate\" : \"2005-10-01\"}","{\"child_id\" : 5, \"name\" : \"Tonja\", \"birthdate\" : \"2011-07-17\"}"}
2 | Jenny | {"{\"child_id\" : 2, \"name\" : \"Tim\", \"birthdate\" : \"2000-11-02\"}","{\"child_id\" : 4, \"name\" : \"Theresa\", \"birthdate\" : \"2017-04-30\"}"}
When the query is executed with sqlalchemy, children
in the resulting rows is already correctly typed as a list of dictionaries.
The explain analyze
output for both queries (on a trivially small test set) shows that the aggregated version is 50-100% slower (150-200μs vs. 250-350μs), but I guess that will rarely be a real problem because - at least in my case
- the execution time of my query is dominated by filtering/sorting the parent rows.
If you would like to play with the example yourself, get the SQL file here.
Top comments (0)