re: How I do solve this impossible SQL query? VIEW POST

TOP OF THREAD FULL DISCUSSION
re: (I just noticed there's a bug in the GROUP BY function so I suggest using the subselects at the top of my comment in the meantime. I'll fix it late...
 

Found it, I missed the DISTINCT. So, the final queries:

With sub selects

PracticalDeveloper_development> SELECT articles.id,
        articles.title,
        articles.published_at,
        articles.positive_reactions_count AS total_reactions,
        articles.cached_tag_list,
        collections.slug AS series_title,

     (SELECT COUNT(reactions.id)
      from reactions
      where reactions.category = 'like'
          AND reactions.reactable_id = articles.id
          AND reactions.reactable_type = 'Article') AS total_likes,

     (SELECT COUNT(reactions.id)
      from reactions
      where reactions.category = 'unicorn'
          AND reactions.reactable_id = articles.id
          AND reactions.reactable_type = 'Article') AS total_unicorns,

     (SELECT COUNT(reactions.id)
      from reactions
      where reactions.category = 'readinglist'
          AND reactions.reactable_id = articles.id
          AND reactions.reactable_type = 'Article') AS total_bookmarks,

     (SELECT SUM(page_views.counts_for_number_of_views)
      from page_views
      where page_views.article_id = articles.id) AS total_views,

     (SELECT AVG(page_views.time_tracked_in_seconds)
      from page_views
      where page_views.article_id = articles.id) AS average_time_spent_on_article
 FROM articles
 LEFT JOIN collections ON collections.id = articles.collection_id
 WHERE articles.user_id = 13
     AND articles.published = TRUE;
+------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+
| id   | title        | published_at               | total_reactions   | cached_tag_list   | series_title   | total_likes   | total_unicorns   | total_bookmarks   | total_views   | average_time_spent_on_article   |
|------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------|
| 26   | test title   | 2019-04-08 11:01:37.354794 | 0                 | tags              | <null>         | 1             | 0                | 0                 | 7             | 192.8571428571428571            |
| 27   | Test         | 2019-04-10 10:50:43.777291 | 0                 | test              | <null>         | 0             | 0                | 0                 | 2             | 97.5000000000000000             |
| 28   | Test with v2 | 2019-04-10 10:51:26.636466 | 0                 | test              | <null>         | 0             | 0                | 0                 | 2             | 15.0000000000000000             |
| 29   | title        | 2019-04-10 19:57:58.779827 | 0                 |                   | <null>         | 0             | 0                | 0                 | 3             | 15.0000000000000000             |
+------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+
SELECT 4

With GROUP by

PracticalDeveloper_development> SELECT articles.id,
        articles.title,
        articles.published_at,
        articles.positive_reactions_count AS total_reactions,
        articles.cached_tag_list,
        collections.slug AS series_title,
        COUNT(DISTINCT reactions.id) FILTER (
                                             WHERE reactions.category = 'like') AS total_likes,
        COUNT(DISTINCT reactions.id) FILTER (
                                             WHERE reactions.category = 'unicorn') AS total_unicorns,
        COUNT(DISTINCT reactions.id) FILTER (
                                             WHERE reactions.category = 'readinglist') AS total_bookmarks,
        SUM(page_views.counts_for_number_of_views) AS total_views,
        AVG(page_views.time_tracked_in_seconds) AS average_time_spent_on_article
 FROM articles
 LEFT JOIN collections ON collections.id = articles.collection_id
 LEFT JOIN reactions ON reactions.reactable_id = articles.id
 AND reactions.reactable_type = 'Article'
 LEFT JOIN page_views ON page_views.article_id = articles.id
 WHERE articles.user_id = 13
     AND articles.published = TRUE
 GROUP BY articles.id,
          articles.title,
          articles.published_at,
          articles.positive_reactions_count,
          articles.cached_tag_list,
          collections.slug;
+------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+
| id   | title        | published_at               | total_reactions   | cached_tag_list   | series_title   | total_likes   | total_unicorns   | total_bookmarks   | total_views   | average_time_spent_on_article   |
|------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------|
| 26   | test title   | 2019-04-08 11:01:37.354794 | 0                 | tags              | <null>         | 1             | 0                | 0                 | 7             | 192.8571428571428571            |
| 27   | Test         | 2019-04-10 10:50:43.777291 | 0                 | test              | <null>         | 0             | 0                | 0                 | 2             | 97.5000000000000000             |
| 28   | Test with v2 | 2019-04-10 10:51:26.636466 | 0                 | test              | <null>         | 0             | 0                | 0                 | 2             | 15.0000000000000000             |
| 29   | title        | 2019-04-10 19:57:58.779827 | 0                 |                   | <null>         | 0             | 0                | 0                 | 3             | 15.0000000000000000             |
+------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+
SELECT 4

As you can see, same result.

Hope this helps Andy!

Wow Rhymes, thanks for such an excellent answer!!! The query plans really help to see how they are performing. Interesting that FILTER allows you to use aggregates much faster than a subquery. Definitely have to read a book or two about SQL now!

In case you're wondering, getting 119 rows took about 4 minutes to run the GROUP BY query. We're not looking to put in this app by any means, so the time is totally fine. Definitely helps me think about how to structure AnalyticsService.rb queries though.

Glad it helped!

Interesting that FILTER allows you to use aggregates much faster than a subquery.

I think it's because the DB is able to do it with less work by using multiple conditions on the same set of data (the joined table on the right of the join, reactions in this example), whereas with a subquery, it has to execute a different subquery for each row in the articles table. Not much to optimize there.

In case you're wondering, getting 119 rows took about 4 minutes to run the GROUP BY query.

Mmm not great though :D how long was it taking with the subqueries? Anyhow, it's something. I have a feeling it can be further optimized by checking if and which indexes are being used and the other idea I had (untested and unexplored) was to use window functions but I haven't tried, nor I know if it can be done or if it's faster :D

We're not looking to put in this app by any means, so the time is totally fine

Good :D

Definitely helps me think about how to structure AnalyticsService.rb queries though

Let me know if you find a different approach for that. After all you have the great advantage of having a working version that you can use as a comparison. You extract a set of data and then play with grouping until you get the same result. :D

code of conduct - report abuse