DEV Community

Eduard Pochtar πŸ‘¨β€πŸ’»
Eduard Pochtar πŸ‘¨β€πŸ’»

Posted on

A better way to check big amount of data against database in Node.js

cover

While working on some projects there might be some obvious performance optimisation tasks and not so obvious as well. Sometimes it's even hard to find a good solution especially when a lot of sources all over the internet look like copy paste.

Unfortunately I couldn't find good solution for this issue all over internet. Fortunately I've got help from my colleague and would like to share it with you.

So how is it possible to check if some data already exists in database? The solution is very simple we just need to know some unique identifier which we will use to query data from database:

const pool = ... // Database connection pool
const product = {
    id: 1,
    title: "Best Product Ever",
    ...
}

pool.product.find({where: {title: product.title}}).then(result => {
    // do something...
});

Enter fullscreen mode Exit fullscreen mode

As you can see nothing complex. But what if we need to check if multiple products exist in our database? We can use loop:

const pool = ... // Database connection pool
const products = [{
    id: 1,
    title: "Best Product Ever",
    ...
}, {
    ...
}]

products.forEach(product =>{ 
    pool.product.find({where: {title: product.title}}).then(result => {
        // Do something...
    });
})
Enter fullscreen mode Exit fullscreen mode

This also doesn't look so complex. As you may already know one of the good practices of connecting to database is using pools. This gives ability to connect to database multiple times without need to close the connection.
If we take a closer look at our loop it wouldn't be hard to understand that as many items we have in array as many connections we might have. It might not sound like a problem when there are 5 or 6 items we have to check for existence in our database. But it can be a big performance issue when we would like to check hundreds, thousands or even hundreds of thousand of records.

Solution

So how can we solve this issue? Instead of one by one connecting to database and checking if the record exists it's possible to query based on data we have compare queried data with our income data.

const pool = ... // Database connection pool
const products = [
    {
        title: 'Some Product Title'...
        ...
    },
    {...},
    {...}
]

const productTitles = products.map(product => item.title)
const query = {
    where: {
        title: productTitles
    },
    attributes: ['title']
}

pool.product.find(query).then(result => {
    const resultTitles = result.map(item => item.title);

    const newProducts = products.filter(product => resultTitles.indexOf(product.title) === -1);
    // Do something...
});

Enter fullscreen mode Exit fullscreen mode

In the above code I've made a query based on list of products, selected all products form database based on query. If there is some title in my query that doesn't exist in database it just won't select it. Then I compared to lists with help of .map and .filter methods. As a result I've got new products in 'newProducts' variable.
In this case we will only one connection pool. Depending on project requirements there might be more connection pools for example one more connection pool to save the data little bit later.
This is useful when you would like to process only the data that doesn't exists in your database and save to database only after few additional process steps.

I hope this small trick will help you to make your project a little bit better.

Top comments (0)