DEV Community

Cover image for How the N+1 query problem can burn your database
Hernan Reyes
Hernan Reyes

Posted on • Originally published at hernanreyes.dev

How the N+1 query problem can burn your database

Overview

Have you ever watched a movie/series or experience your self going to a hotel where they have room service?, let's put that you go to one of these hotels, where the hotel’s restaurant is at the first floor, and you’re hosteling at the 10th floor. Now, when you are there, you decide to order something to eat for lunch, and imaging that the waitress instead of bringing you all the food with the compliments (drinks, desserts, etc.) at once, bring you every meal, drink, dessert and that, one by one, that will be so inefficient, because the waitress will have to do many runs before bringing you everything you asked for, going back and forth from floor 1 to 10. This is what the N+1 problem is, getting everything you request in many runs.

The ideal will be to carry what you ordered in a bussing cart like this, so the waitress can bring all at once:
Bussing cart
In this article will see how this problem look in code and some solutions you can do to avoid this and ensure the best performance for your application.

Time to see some code

To show how the N+1 looks in your code, I’m going to build a simple console application that prints the available menus to order from a restaurant. For this, we’re going to have a database with a meals and drinks table, in the menu, each meal will come with a drink. Let's see the models for these tables:
Tables


// Meal model a record of meals table
type Meal struct {
    ID          uint
    Name        string
    Description string
    DrinkID     uint
    Drink       Drink
}

type Meals []Meal

// Drink model a record of drinks table
type Drink struct {
    ID          uint
    Name        string
    Description string
}

type Drinks []Drink
Enter fullscreen mode Exit fullscreen mode

Now let's see the N+1 in action, here we have some methods to query data from the database (you can find the full code on my repo)

type Waitress struct {
    db *sql.DB
}

// getMeals gets 20 meals from the db
func (b Waitress) getMeals() (Meals, error) {
    // this method has the logic to query from the db, if you want to see
    // the actual code, you can go the repo

    // SQL used in this method
    // `SELECT id, name, description, drink_id FROM meals LIMIT 20`

    return meals, nil
}

// getDrinkByID gets a drink from the db by an id
func (b Waitress) getDrinkByID(ID uint) (Drink, error) {
    // this method has the logic to query from the db, if you want to see
    // the actual code, you can go the repo

    // SQL used in this method
  // `SELECT id, name, description FROM drinks WHERE id = $1`

    return Drink{}, nil
}
Enter fullscreen mode Exit fullscreen mode

And finally the method with the N+1 query problem

func (b Waitress) ListMenu() (Meals, error) {
    // the `1` of our `N+1`
    meals, err := b.getMeals()
    if err != nil {
        return nil, err
    }

    // here is our waitress going back and forth 
    // from floor 1 to 10, this will be the `N` of our `N+1`
    for i, meal := range meals {
        drink, err := b.getDrinkByID(meal.DrinkID)
        if err != nil {
            return nil, err
        }

        meals[i].Drink = drink
    }

    return meals, nil
}
Enter fullscreen mode Exit fullscreen mode

Yes, this simple logic can burn your database, because you are going back and forth to add the drinks to each meal, which is not efficient.

The more records you have to query or the most users you have, the most this N+1 problem we’ll affect your application because the time complexity is O(N)/Linear time.

ℹ️ Here I gave you an example in the backend, but this problem can also be found in your frontend, where instead of calling your database directly, you’ll be calling an endpoint of your backend which at the same time would call the database.

Solution

Now let's see two solutions to our problem.

  1. Join the authors in the SQL query

    This maybe the easier solution, in here you’ll just have to write a query like the following:

    SELECT m.id, m.name, m.description, d.name, d.description
    FROM meals AS m
        INNER JOIN drinks AS d ON d.id = m.drink_id
    

    With this query, our code will look the following

    type Solution1 struct {
        db *sql.DB
    }
    
    func (s Solution1) ListMenu() (Meals, error) {
        // this method will execute the previous SQL
        // SELECT
        // m.id,
        //  m.name,
        //  m.description,
        //  d.name,
        //  d.description
        // FROM meals AS m
        // INNER JOIN drinks AS d ON d.id = m.drink_id
    
        // if you want to see the full code, refer to my repo
    
        return meals, nil
    }
    

    With this query, now we only have you query our database once and that’s it.

  2. Get the meals and then join the drinks with your programming language

    No, we won’t do the same as the example where we saw the N+1 problem, here, instead of querying the meals and then querying the drinks one by one, we’ll just do two queries to our database, let see how:

    
    func (s Solution2) ListMenu() (Meals, error) {
        meals, err := s.getMeals()
        if err != nil {
            return nil, err
        }
    
        // this is our waitress bringing everything in a bussing cart 
        drinks, err := s.getDrinksByIDsIn(meals.GetUniqueDrinkIDs())
        if err != nil {
            return nil, err
        }
    
        meals.JoinDrinks(drinks)
    
        return meals, nil
    }
    
    func (s Solution2) getMeals() (Meals, error) {
        // this method has the logic to query from the db, if you want to see
        // what's in these methods, you can go the repo
    
        // sql used in this method
        // `SELECT id, name, description, drink_id FROM meals LIMIT 20`
    
        return Meals{}, nil
    }
    
    func (s Solution2) getDrinksByIDsIn(ids []uint) (Drinks, error) {
        // this method has the logic to query from the db, if you want to see
        // what's in these methods, you can go the repo
    
        // sql used in this method
        // `SELECT id, name, description FROM drinks WHERE id IN (1, 2, 3)`
    
        return Drinks{}, nil
    }
    

    As you can see, we only have two queries to our database: s.getMeals() and s.getDrinksByIDsIn, and if you read the ListMenu method, you’ve noticed that we introduced to more methods, let see what they do and why we need them:

    // to query the drinks, we need to know their IDs, for that
    // we add this method to our `Meals` slice
    func (m Meals) GetUniqueDrinkIDs() []uint {
        var ids []uint
        drinks := make(map[uint]struct{}, 0)
    
        for _, v := range m {
            _, ok := drinks[v.DrinkID]
            if ok {
                continue
            }
    
            drinks[v.DrinkID] = struct{}{}
            ids = append(ids, v.DrinkID)
        }
    
        return ids
    }
    
    // once we've query our meals and drinks, we proceed to join the 
    // corresponding drink to our meals
    func (m Meals) JoinDrinks(drink Drinks) {
        for i, meal := range m {
            m[i].Drink = drink.GetByID(meal.DrinkID)
        }
    }
    
    // this methos is needed by the `JoinDrinks()` method
    // so it can find the drink for a meal 
    func (d Drinks) GetByID(ID uint) Drink {
        for _, drink := range d {
            if drink.ID == ID {
                return drink
            }
        }
    
        return Drink{}
    }
    

Now, you can see we don't query the database for every drink, instead, in one query we get all the meals, and in the other we query the drinks and then join them to the corresponding meals.

When to use one solution or the other?

Well, in this app, every meal includes just one drink, but what happened if a meal includes more than one drink?, in that scenario, the first solution can't help us, because the SQL query is going to repeat the record for every drink in a meal, so what we want to do, is use the second solution, when we first query the meals, and then get the drinks to join them to the corresponding meals

Personal experience

At work, we have a microservice that is in charge to cache a lot of data about the products we have, twice a day or at demand, and it used to take about 1 minute to cache all the data because of this problem, after we remove the N+1s, it went from 1 minute to 2 seconds!.

Conclusion

Don’t overestimate a simple logic like the N+1’s, you can fall into this problem easily, but also you can fix it easily, but if you don’t do it in time, your application performance will let you know overtime.

Something I didn’t mention is, the N+1 in ORMs like Gorm because I don’t have experience with these, but I will recommend if you are using ORMs, that you dig into the underlying code to see if you have this problem.

⚠️ The way I structure the code in this article is not meant to be a guide on how you should structure your code, it is made as simple as possible to focus on the problem and solution code.

Homework

If you are working on a project, or you already have projects in production, go check them out to get rid of any N+1 you find.

References

  1. N+1 in Laravel: Here you can see the N+1 in Eloquent (an ORM for Laravel)
  2. Figma Jam and IconDuckTo make the illustrations
  3. Article repositoryYou can find the examples in my GitHub

Top comments (3)

Collapse
 
aarone4 profile image
Aaron Reese

Hmmm.
Two things. Firstly in the function getDrinksByIDsIn you are passing a dynamic query to the database with a custom list of IDs each time. Although this is avoiding the N+1 issue there is likely to be a performance hit because the database cannot cache an optimised query plan.
Second, I was struggling to follow the code as I don't recognize the language and syntax but I am pretty sure the getDrinkById is doing an N+1 search as it has to internally iterate the array to find the correct drink Id.
Probably faster in memory than network latency to the database but certainly not free.
As for the comment about returning multiple meal records if a meal has more than one drink, a properly structured object with a collection of drinks would solve the issue. Modelling your data is the key to happy applications.

Collapse
 
volker_schukai profile image
Volker Schukai

Interesting article. ORMs often tempt you not to deal with the queries.

Collapse
 
aarone4 profile image
Aaron Reese

Agreed.
Naieve implementation of ORM pattern will say: GET Customers_Collection.
For each Customer in Customers_Collection, Append, GET Orders_for_customer (Customer.Customer_ID)
If you have 100 customers, that is 1 call to get the customer list (1) and 100 calls to get the orders (N), hence (N+1).
Developers tend to work with very small datasets on fast machines with local data stores so the performance issue is often not seen until regression testing or possibly UAT testing where the environment is much closer to what you would see in production. Worse, for fledgling apps then there are only 100 customers in the entire system the N+1 issue may not be apparrent, however once you are up to 10M customers, it is going to hurt!