DEV Community

Cover image for How Metis Optimized Queries Executed by Sequelize - Part 2
Adam Furmanek for Metis

Posted on • Updated on • Originally published at metisdata.io

How Metis Optimized Queries Executed by Sequelize - Part 2

We continue our journey into how Metis can prevent, monitor, and troubleshoot our databases. In previous part we started playing with Sequelize, a feature-rich ORM for TypeScript. This time we’re going to see more challenging examples. We are going to see actual insights provided by Metis and how they improve the query performance.

Introduction

Sequelize is an ORM for Node.js and TypeScript. It can connect to PostgreSQL and other SQL engines. It supports transactions, relations, eager loading, lazy loading, functions, and other common operations. Previous part covered some of the queries from the first part of this series. This time we are going to implement more sophisticated scenarios to see how Sequelize can deal with them. Let’s go.

Given two people, list what movies they appeared in together

Let’s find common movies for two people. That’s the application code doing that:

return titleBasic
        .findAll({
          include: [
            {
              model: titlePrincipal,
              required: true,
              duplicating: false,
              as: 'titleBasicTitlePrincipal',
              where: {
                'nconst': actor1
              }
            },
            {
              model: titlePrincipal,
              required: true,
              duplicating: false,
              as: 'titleBasicTitlePrincipal2',
              where: {
                'nconst': actor2
              }
            },
          ]
        });
Enter fullscreen mode Exit fullscreen mode

The code generates the following query:

SELECT
  TitleBasic.tconst,
  TitleBasic.titletype,
  TitleBasic.primarytitle,
  TitleBasic.originaltitle,
  TitleBasic.isadult,
  TitleBasic.startyear,
  TitleBasic.endyear,
  TitleBasic.runtimeminutes,
  TitleBasic.genres,
  titleBasicTitlePrincipal.tconst AS titleBasicTitlePrincipal.tconst,
  titleBasicTitlePrincipal.ordering AS titleBasicTitlePrincipal.ordering,
  titleBasicTitlePrincipal.nconst AS titleBasicTitlePrincipal.nconst,
  titleBasicTitlePrincipal.category AS titleBasicTitlePrincipal.category,
  titleBasicTitlePrincipal.job AS titleBasicTitlePrincipal.job,
  titleBasicTitlePrincipal.characters AS titleBasicTitlePrincipal.characters,
  titleBasicTitlePrincipal2.tconst AS titleBasicTitlePrincipal2.tconst,
  titleBasicTitlePrincipal2.ordering AS titleBasicTitlePrincipal2.ordering,
  titleBasicTitlePrincipal2.nconst AS titleBasicTitlePrincipal2.nconst,
  titleBasicTitlePrincipal2.category AS titleBasicTitlePrincipal2.category,
  titleBasicTitlePrincipal2.job AS titleBasicTitlePrincipal2.job,
  titleBasicTitlePrincipal2.characters AS titleBasicTitlePrincipal2.characters
FROM
  imdb.title_basics AS TitleBasic
  INNER JOIN imdb.title_principals AS titleBasicTitlePrincipal ON TitleBasic.tconst = titleBasicTitlePrincipal.tconst
  AND titleBasicTitlePrincipal.nconst = 'nm0302368'
  INNER JOIN imdb.title_principals AS titleBasicTitlePrincipal2 ON TitleBasic.tconst = titleBasicTitlePrincipal2.tconst
  AND titleBasicTitlePrincipal2.nconst = 'nm0001908';
Enter fullscreen mode Exit fullscreen mode

Metis indicates a missing index:

Image description

The query takes 5 seconds to finish. We can now improve the query performance by adding an index for title_principals. This is the code to do that:

CREATE INDEX IF NOT EXISTS title_principals_nconst_idx ON imdb.title_principals(nconst) INCLUDE (tconst)
Enter fullscreen mode Exit fullscreen mode

Notice that we had to configure two associations in the model to make this work. This is very misleading and should be avoided if possible. What if we don’t like the second association? We can implement joins in the application code, like this:

const first = titlePrincipal
        .findAll({
          attributes: ['tconst'],
          where: {
            'nconst': actor1
          }
        }).then(titles => titles.map(t => t.tconst));


      const second = titlePrincipal
      .findAll({
        attributes: ['tconst'],
        where: {
          'nconst': actor2
        }
      }).then(titles => titles.map(t => t.tconst));


      return first.then(firstTitles => second.then(secondTitles => {
        return titleBasic
          .findAll({
            where: {
              tconst: {
                [Op.and]: [
                  {
                    [Op.in]: [...firstTitles]
                  },
                  {
                    [Op.in]: [...secondTitles]
                  }
                ]
              }
            }
        });
      }));
Enter fullscreen mode Exit fullscreen mode

We first load titles for the first person, then load titles for the second person. Finally, we process the lists together, and we send another query that would use both of them. This is the interaction with the database:

First query:

SELECT
  tconst
FROM
  imdb.title_principals AS TitlePrincipal
WHERE
  TitlePrincipal.nconst = 'nm0302368';
Enter fullscreen mode Exit fullscreen mode

Second query:

SELECT
  tconst
FROM
  imdb.title_principals AS TitlePrincipal
WHERE
  TitlePrincipal.nconst = 'nm0001908';
Enter fullscreen mode Exit fullscreen mode

These two queries are rather straightforward. We could then intersect the datasets in the application code, or we can ask the database to do so. No matter what we do, we end up with a query similar to this one:

SELECT
  tconst,
  titletype,
  primarytitle,
  originaltitle,
  isadult,
  startyear,
  endyear,
  runtimeminutes,
  genres
FROM
  imdb.title_basics AS TitleBasic
WHERE
  (
    TitleBasic.tconst IN (
'tt0000439','tt0014222','tt0021457','tt0029541','tt0447157','tt0020221','tt0020275','tt0460244','tt0829197','tt0008572','tt0009046','tt0040991','tt0155701'
    )
    AND TitleBasic.tconst IN (
'tt0000439','tt0000452','tt0000501','tt0000528','tt0000578',...
    )
  );
Enter fullscreen mode Exit fullscreen mode

This scenario is now slower. Processing results in application takes 9 seconds to complete, so it’s slower than plain SQL. We can’t tell that it’s always the case, as sometimes it’s faster to process data in the application, especially when we don’t extract many records from the SQL. Your mileage may vary, so always measure your solution.

List all of the cast and crew in a given movie

Let’s now find all the people involved in the movie. Here is the first solution. We can just send the raw query:

return sequelize.query(`
          SELECT DISTINCT NB.*
          FROM imdb.title_basics AS TB
          LEFT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst
          LEFT JOIN imdb.title_crew AS TC ON TC.tconst = TB.tconst
          LEFT JOIN imdb.name_basics AS NB ON
                  NB.nconst = TP.nconst
                  OR TC.directors = NB.nconst
                  OR TC.directors LIKE NB.nconst || ',%'::text
                  OR TC.directors LIKE '%,'::text || NB.nconst || ',%'::text
                  OR TC.directors LIKE '%,'::text || NB.nconst
                  OR TC.writers = NB.nconst
                  OR TC.writers LIKE NB.nconst || ',%'::text
                  OR TC.writers LIKE '%,'::text || NB.nconst || ',%'::text
                  OR TC.writers LIKE '%,'::text || NB.nconst
          WHERE TB.tconst = :tconst
        `, {
          model: nameBasic,
          mapToModel: true,
          replacements: {
            tconst: tconst
          }
      });
Enter fullscreen mode Exit fullscreen mode

This works, but is inefficient as we identified in the previous part. We can do the query with union to speed things up:

return sequelize.query(`
        SELECT DISTINCT NB.*
        FROM imdb.title_principals AS TP
        JOIN (
          SELECT tconst, directors, writers
          FROM imdb.title_crew
          WHERE tconst = :tconst
        ) AS TC ON TC.tconst = TP.tconst
        LEFT JOIN imdb.name_basics AS NB ON NB.nconst = TP.nconst
      UNION
        SELECT DISTINCT NB.*
        FROM imdb.title_principals AS TP
        JOIN (
          SELECT tconst, directors, writers
          FROM imdb.title_crew
          WHERE tconst = :tconst
        ) AS TC ON TC.tconst = TP.tconst
        LEFT JOIN imdb.name_basics AS NB ON TC.directors LIKE NB.nconst || ',%'::text
      UNION
        SELECT DISTINCT NB.*
        FROM imdb.title_principals AS TP
        JOIN (
          SELECT tconst, directors, writers
          FROM imdb.title_crew
          WHERE tconst = :tconst
        ) AS TC ON TC.tconst = TP.tconst
        LEFT JOIN imdb.name_basics AS NB ON TC.directors LIKE '%,'::text || NB.nconst || ',%'::text
      UNION
        SELECT DISTINCT NB.*
        FROM imdb.title_principals AS TP
        JOIN (
          SELECT tconst, directors, writers
          FROM imdb.title_crew
          WHERE tconst = :tconst
        ) AS TC ON TC.tconst = TP.tconst
        LEFT JOIN imdb.name_basics AS NB ON TC.directors LIKE '%,'::text || NB.nconst
      UNION
        SELECT DISTINCT NB.*
        FROM imdb.title_principals AS TP
        JOIN (
          SELECT tconst, directors, writers
          FROM imdb.title_crew
          WHERE tconst = :tconst
        ) AS TC ON TC.tconst = TP.tconst
        LEFT JOIN imdb.name_basics AS NB ON TC.writers = NB.nconst
      UNION
        SELECT DISTINCT NB.*
        FROM imdb.title_principals AS TP
        JOIN (
          SELECT tconst, directors, writers
          FROM imdb.title_crew
          WHERE tconst = :tconst
        ) AS TC ON TC.tconst = TP.tconst
        LEFT JOIN imdb.name_basics AS NB ON TC.writers LIKE NB.nconst || ',%'::text
      UNION
        SELECT DISTINCT NB.*
        FROM imdb.title_principals AS TP
        JOIN (
          SELECT tconst, directors, writers
          FROM imdb.title_crew
          WHERE tconst = :tconst
        ) AS TC ON TC.tconst = TP.tconst
        LEFT JOIN imdb.name_basics AS NB ON TC.writers LIKE '%,'::text || NB.nconst || ',%'::text
      UNION
        SELECT DISTINCT NB.*
        FROM imdb.title_principals AS TP
        JOIN (
          SELECT tconst, directors, writers
          FROM imdb.title_crew
          WHERE tconst = :tconst
        ) AS TC ON TC.tconst = TP.tconst
        LEFT JOIN imdb.name_basics AS NB ON TC.writers LIKE '%,'::text || NB.nconst
      `, {
        model: nameBasic,
        mapToModel: true,
        replacements: {
          tconst: tconst
        }
      });
    }
Enter fullscreen mode Exit fullscreen mode

They take 100 seconds and 25 seconds respectively. That’s not very fast. Let’s try implementing the same in the application code:

const crewViaTitlePrincipals = titlePrincipal
        .findAll({
          attributes: ['nconst'],
          where: {
            'tconst': tconst
          }
        }).then(crew => crew.map(c => c.nconst));


      const crewViaTitleCrew = titleCrew
        .findAll({
          where: {
            'tconst': tconst
          }
        });


      const crewMatchingNames = crewViaTitleCrew.then(crew => crew.flatMap(c => [
          c.directors.split(','),
          c.writers.split(',')
        ].flat()));


      const allMatchingNames = crewViaTitlePrincipals.then(crew1 => crewMatchingNames.then(crew2 => new Set([crew1, crew2].flat())));


      return allMatchingNames.then(names => nameBasic
        .findAll({
          where: {
            'nconst': { [Op.in]: [...names] }
          }
        }));
Enter fullscreen mode Exit fullscreen mode

We first take the crew from the title_principals table. Next, we extract the crew from the title_crew. Finally, we split directors and writers, and we calculate the intersection of both datasets to get the names.

This generates the following trace in Metis:

Image description

It’s all green! We can see that we have three queries sent, but they are blazingly fast because of indexes. This time it was much better to do processing in the application instead on the SQL level.

As mentioned in the previous section, there is no silver bullet. You need to measure your code and see where the performance benefits are.

Find the most prolific actor in a given period

Let’s now find the person that starred in the most movies in a given period. Let’s do that using application code:

const titlesMatchingPeriod = titleBasic
        .findAll({
          attributes: ['tconst'],
          where: {
            startyear: {
              [Op.and]: [
                { [Op.gte]: startYear },
                { [Op.lte]: endYear }
              ]
            }
          }
        }).then(titles => titles.map(t => t.tconst));


      const principals = titlesMatchingPeriod.then(titles => titlePrincipal
        .findAll({
          attributes: ['nconst'],
          where: {
            tconst: { [Op.in]: [...new Set(titles)] }
          }
        }).then(principals => {
          const counts = principals
            .reduce(
              (entryMap, e) => {
                entryMap[e.nconst] = (entryMap[e.nconst] || 0) + 1;
                return entryMap;
              },
              {}
            );
          const keys = Object.keys(counts);
          const countsWithKeys = keys.map(k => [counts[k], k]);
          countsWithKeys.sort((pair1, pair2) => pair2[0] - pair1[0]);
          const topResults = countsWithKeys.splice(0,1);
          return topResults;
        })
      );


      return principals.then(countsWithKeys => nameBasic
        .findAll({
          where: {
            nconst: { [Op.in]: countsWithKeys.map(c => "" + c[1]) }
          }
        }).then(actors => actors.map(a => {
          a.movies_count = countsWithKeys.filter(c => c[1] == a.nconst)[0][0];
          return a;
        })));
Enter fullscreen mode Exit fullscreen mode

We first find titles in a given period. Next, we take the crew for each of them. Finally, we need to group the identifiers and get the top one, and get the details. This is a bit lengthy and is rather slow because we get all the movies in a given period. That’s a lot of data:

Image description

Metis indicates nearly 70 000 rows. That’s a lot to process. What’s more, we then take all those identifiers and send in another query which is nearly 100 000 characters long.

Can we do better? Yes, we can send a raw query:

return sequelize.query(`
        WITH best_actor AS (
                SELECT TP.nconst, COUNT(*) AS number_of_titles
                FROM imdb.title_basics AS TB
                LEFT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst
                WHERE TB.startyear >= :startyear AND TB.startyear <= :endyear AND TP.nconst IS NOT NULL
                GROUP BY TP.nconst
                ORDER BY number_of_titles DESC
                LIMIT 1
        )
        SELECT BA.nconst, BA.number_of_titles, NB.primaryname, nb.birthyear, NB.deathyear, nb.primaryprofession
        FROM best_actor AS BA
        JOIN imdb.name_basics AS NB ON NB.nconst = BA.nconst
      `, {
        model: nameBasic,
        mapToModel: true,
        replacements: {
          startyear: startYear,
          endyear: endYear
        }
      });
Enter fullscreen mode Exit fullscreen mode

Metis shows the following:

Image description

There are two tables that we can configure indexes on: title_principals and title_basics. Once we do that, we get the best performance.

Find most prolific actors in a given genre

Let’s now find actors that did the most movies in a given genre. This is a very similar scenario to the previous one. Let’s start with the application code again:

const titlesMatchingGenre = titleBasic
        .findAll({
          attributes: ['tconst', 'genres'],
          where: {
            genres: { [Op.like]: '%' + genre + '%' }
          }
        }).then(titles => titles
          .filter(t => t.genres.split(',').indexOf(genre) >= 0)
          .map(t => t.tconst)
        );


      const principals = titlesMatchingGenre.then(titles => titlePrincipal
        .findAll({
          attributes: ['nconst'],
          where: {
            tconst: { [Op.in]: [...new Set(titles)] }
          }
        }).then(principals => {
          const counts = principals
            .reduce(
              (entryMap, e) => {
                entryMap[e.nconst] = (entryMap[e.nconst] || 0) + 1;
                return entryMap;
              },
              {}
            );
          const keys = Object.keys(counts);
          const countsWithKeys = keys.map(k => [counts[k], k]);
          countsWithKeys.sort((pair1, pair2) => pair2[0] - pair1[0]);
          const topResults = countsWithKeys.splice(0,10);
          return topResults;
        })
      );


      return principals.then(countsWithKeys => nameBasic
        .findAll({
          where: {
            nconst: { [Op.in]: countsWithKeys.map(c => "" + c[1]) }
          }
        }).then(actors => actors.map(a => {
          a.movies_count = countsWithKeys.filter(c => c[1] == a.nconst)[0][0];
          return a;
        })));
Enter fullscreen mode Exit fullscreen mode

This is nearly the same as before, only this time we query for genre. Metis shows the following:

Image description

This is even worse than the previous scenario. We now extract around 400 000 rows. While it takes 2.5 seconds to execute the query, sending the data over the wire takes much longer.

To fix that, we can use the raw query:

return sequelize.query(`
        WITH best_actors AS (
          SELECT TP.nconst, COUNT(*) AS movies_count
          FROM imdb.title_basics AS TB
          LEFT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst
          WHERE TB.genres = :genre OR TB.genres LIKE (:genre || '%,') OR TB.genres LIKE ('%,' || :genre || ',%') OR TB.genres LIKE ('%,' || :genre)
          GROUP BY TP.nconst
          ORDER BY movies_count DESC
          LIMIT 10
        )
        SELECT BA.nconst, NB.primaryname, NB.birthyear, BA.movies_count
        FROM best_actors AS BA
        JOIN imdb.name_basics AS NB ON NB.nconst = BA.nconst
        ORDER BY movies_count DESC
      `, {
        model: nameBasic,
        mapToModel: true,
        replacements: {
          genre: genre
        }
      });
Enter fullscreen mode Exit fullscreen mode

This gives the following:

Image description

Finally, we can configure indexes and make this query much faster.

Finding most common coworkers

Let’s now try to find people that work together the most. We can do that with a raw query:

return sequelize.query(`
        WITH RECURSIVE numbers AS (
          SELECT 1 AS number
          UNION ALL
          SELECT number + 1 AS number FROM numbers WHERE number < 1500
        ),
        titles_for_person AS (
            SELECT TC.tconst
            FROM imdb.title_crew AS TC
            WHERE directors = :nconst OR directors LIKE :nconst || ',%' OR directors LIKE '%,' || :nconst || ',%' OR directors LIKE '%,' || :nconst
          UNION
            SELECT TC.tconst
            FROM imdb.title_crew AS TC
            WHERE writers = :nconst OR writers LIKE :nconst || ',%' OR writers LIKE '%,' || :nconst || ',%' OR writers LIKE '%,' || :nconst
          UNION
            SELECT tconst
            FROM imdb.title_principals
            WHERE nconst = :nconst
        ),
        titles_corresponding AS (
          SELECT TC.tconst, TC.directors, TC.writers
          FROM imdb.title_crew AS TC
          JOIN titles_for_person AS TFP ON TFP.tconst = TC.tconst
        ),
        split_associations AS (
            SELECT TC.tconst, SPLIT_PART(TC.directors, ',', N.number) AS nconst
            FROM titles_corresponding AS TC
            CROSS JOIN numbers AS N
            WHERE directors IS NOT NULL AND CHAR_LENGTH(directors) - CHAR_LENGTH(REPLACE(directors, ',', '')) + 1 >= N.number
          UNION
            SELECT TC.tconst, SPLIT_PART(TC.writers, ',', N.number) AS nconst
            FROM titles_corresponding AS TC
            CROSS JOIN numbers AS N
            WHERE writers IS NOT NULL AND CHAR_LENGTH(writers) - CHAR_LENGTH(REPLACE(writers, ',', '')) + 1 >= N.number
        ),
        all_associations AS (
            SELECT SA.tconst, SA.nconst
            FROM split_associations AS SA
          UNION
            SELECT TP.tconst, TP.nconst
            FROM imdb.title_principals AS TP
            JOIN titles_for_person AS TFP ON TFP.tconst = TP.tconst
        ),
        other_people AS (
          SELECT nconst
          FROM all_associations
          WHERE nconst != :nconst
        ),
        top_peers AS (
          SELECT OP.nconst, COUNT(*) as common_titles
          FROM other_people AS OP
          GROUP BY nconst
          ORDER BY common_titles DESC
          LIMIT 5
        )
        SELECT TP.nconst, TP.common_titles, NB.*
        FROM top_peers AS TP
        JOIN imdb.name_basics AS NB ON NB.nconst = TP.nconst
        ORDER BY TP.common_titles DESC
      `, {
        model: nameBasic,
        mapToModel: true,
        replacements: {
          nconst: nconst
        }
      });
Enter fullscreen mode Exit fullscreen mode

Metis shows the following:

Image description

We can see multiple table scans and millions of rows read. We can optimize that by adding indexes. Let’s see if we can make it faster just by running the code in application:

const titlesPrincipalMatchingPerson = titlePrincipal
        .findAll({
          attributes: ['tconst'],
          where: {
            nconst: nconst
          }
        }).then(titles => titles.map(t => t.tconst));


      const otherTitlePrincipals = titlesPrincipalMatchingPerson.then(titles => titlePrincipal
        .findAll({
          attributes: ['nconst'],
          where: {
            tconst: { [Op.in]: titles },
            nconst: { [Op.ne]: nconst }
          }
        })).then(titles => titles.map(t => t.nconst));


      const titleCrewMatchingPerson = titleCrew
        .findAll({
          where: {
            [Op.or]: [
              { directors: { [Op.like]: '%' + nconst + '%' } },
              { writers: { [Op.like]: '%' + nconst + '%' } }
            ]
          }
        }).then(titles => {
          return titles
            .filter(t => (t.directors || "").split(",").indexOf(nconst) >= 0 || (t.writers || "").split(",").indexOf(nconst) >= 0)
            .map(t => [...new Set([(t.directors || "").split(","), (t.writers || "").split(",")].flat())].filter(n => n != nconst && n != "" && n))
        });


      const allTeammates = Promise.all([otherTitlePrincipals, titleCrewMatchingPerson]).then(nconsts => {
        return nconsts.flat().filter(n => n && n != "");
      });


      const topTeammates = allTeammates.then(nconsts => {
        const counts = nconsts
          .reduce(
            (entryMap, e) => {
              entryMap[e] = (entryMap[e] || 0) + 1;
              return entryMap;
            },
            {}
          );
        const keys = Object.keys(counts);
        const countsWithKeys = keys.map(k => [counts[k], k]);
        countsWithKeys.sort((pair1, pair2) => pair2[0] - pair1[0]);
        const topResults = countsWithKeys.splice(0,5);
        return topResults;
      });


      return topTeammates.then(countsWithKeys => nameBasic
        .findAll({
          where: {
            nconst: { [Op.in]: countsWithKeys.map(c => "" + c[1]) }
          }
        }).then(actors => actors.map(a => {
          a.common_titles = countsWithKeys.filter(c => c[1] == a.nconst)[0][0];
          return a;
        })));
Enter fullscreen mode Exit fullscreen mode

First, we want to take that data from the title_principals table. We start with a known nconst, find all movies for a given person, and then for all the movies we find other people.
Next, we take similar data from title_crew. We then parse the data, aggregate it, and find top teammates. Finally, we get the data from the name_basics table.

Image description

We see that Metis identifies indexes that may improve the scenario. Even without these indexes, the query running in application finishes in 6 seconds vs 19 seconds with raw query. That’s a good starting point. Once again, we need to add improvements and measure them to see how things work in practice.

Summary

We examined various real scenarios with Sequelize to see how we can improve the performance. In previous parts of this series we relied on the database only to make it faster. This time we can do some processing with ORM and the application to change the performance characteristics. Metis shows good insights into all what we do, so we can track the optimizations and performance benefits as we proceed with the implementation changes. We can do the same with our actual applications even if we have very few rows in the database.

Top comments (0)