DEV Community

Martin Vielsmaier
Martin Vielsmaier

Posted on • Originally published at moserei.de on

PostgreSQL: JSON aggregation

On a client project I have to do a search in a structure of tables and return results including some child objects. A classical instance of the N+1 query problem, if I was using an ORM. I decided not to use the ORM for the feature because it will be one of the hottest paths of the application and I wanted more control over the queries (rather complex logic with LIKE & sorting). But the filtering/sorting is not the topic today, so I will leave it out in the examples.

For illustration, let’s assume the following schema:

CREATE TABLE parents (
  parent_id INTEGER PRIMARY KEY,
  name VARCHAR
);

CREATE TABLE children (
  child_id INTEGER PRIMARY KEY,
  name VARCHAR,
  birthdate DATE,
  parent_id INTEGER,
  FOREIGN KEY (parent_id) REFERENCES parents
);

Without any further thinking I would do a query like this and a bit of code that constructs parent objects with their respective children.

SELECT * FROM parents JOIN children USING (parent_id) ORDER BY parent_id;
parent_id | name | child_id | name | birthdate  
-----------+-------+----------+---------+------------
         1 | Jim | 1 | Tamara | 2017-02-01
         1 | Jim | 3 | Tom | 2005-10-01
         1 | Jim | 5 | Tonja | 2011-07-17
         2 | Jenny | 2 | Tim | 2000-11-02
         2 | Jenny | 4 | Theresa | 2017-04-30

Just before I started writing the logic to pull apart the result and put it into it’s object structure, I thought, “It would be nice to let the database put together all the children of one parent into an array.”

I already knewarray_aggwhich aggregates all values into an array. After some reading I discoveredjson_build_objectwhich takes a sequence of keys and values and creates a JSON object from it.

So my final query looked like this:

SELECT
    parent_id
  , p.name
  , array_agg(json_build_object(
      'child_id', c.child_id
    , 'name', c.name
    , 'birthdate', c.birthdate
  )) as children
FROM parents p
JOIN children c USING (parent_id)
GROUP BY 1, 2;
parent_id | name | children                                                                                                              
-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         1 | Jim | {"{\"child_id\" : 1, \"name\" : \"Tamara\", \"birthdate\" : \"2017-02-01\"}","{\"child_id\" : 3, \"name\" : \"Tom\", \"birthdate\" : \"2005-10-01\"}","{\"child_id\" : 5, \"name\" : \"Tonja\", \"birthdate\" : \"2011-07-17\"}"}
         2 | Jenny | {"{\"child_id\" : 2, \"name\" : \"Tim\", \"birthdate\" : \"2000-11-02\"}","{\"child_id\" : 4, \"name\" : \"Theresa\", \"birthdate\" : \"2017-04-30\"}"}

When the query is executed with sqlalchemy, children in the resulting rows is already correctly typed as a list of dictionaries.

The explain analyze output for both queries (on a trivially small test set) shows that the aggregated version is 50-100% slower (150-200μs vs. 250-350μs), but I guess that will rarely be a real problem because - at least in my case

  • the execution time of my query is dominated by filtering/sorting the parent rows.

If you would like to play with the example yourself, get the SQL file here.

Top comments (0)