Daniil Bazhenov

Posted on May 4, 2023

How Database Indexes Affect MongoDB and Application Performance

#beginners #tutorial #database #webdev

I will show with real examples and graphs how indexes can affect the performance of the database and your application.

This article was inspired by a lot of feedback from my friends saying that MongoDB is slow or crashing. Indeed, when I started working with MongoDB, I noticed very quickly that it starts working slowly, although there were few documents in the database. MySQL was faster in the same situation during development with no load. I just didn't add any indexes.

As an experiment, I will run an application that will make lots of queries to different collections in the database. In the experiment, I will show that it can run fast and slow and even crash entirely because of our mistakes, just like any tool. At the beginning of the experiment, all queries are executed using indexes. Then I'll remove the index for a while, show how it affects it. And I'll create it again to restore the performance.

About the application and the database

The application is developed in PHP 8 and uses the default MongoDB driver for database queries.

The application is deployed on a t2.micro (Free Tier) instance in AWS, with 1 vCPU and 1 GiB of Memory.

The application makes queries to several collections in a loop:

Pages - 3.2 million documents
Users - 130k documents
Docs - 100k documents

Percona Server for MongoDB is used as the database. MongoDB ReplicaSet consists of three nodes also deployed in AWS, each node on a separate t2.micro instance.

I use the free, open-source tool Percona Monitoring and Management (PMM) for monitoring and graphing.

These limited resources are available to everyone for free, and you will see how much you can get out of them.

Let's start the experiment

I ran the application to do the load on the database. All queries in this experiment were performed to the Primary node to simplify the demonstration. And also in the final section of the article I will make an experiment with the load on all three nodes.

The loop performed several FindOne queries

Get a document from a collection sorted by timestamp
Get an random document by id.

Queries were run against two different collections (with 3 million and 100k documents)

Example code

   $last_user = $app['db']->users->findOne([],
   [
       'sort' => [
           'timestamp' => -1
       ]
   ]);

   $user_id = rand(1, $last_user['user_id']);

   $user_data = $app['db']->users->findOne(
   [
       'user_id' => $user_id
   ]);

When I ran one process, I got about 1,000 queries per second and 30% of the CPU load of the database instance.

Then I started the second similar process. I got about 1.8k queries per second and just over 50% CPU load.

I was shocked by these results because these are very big numbers. For example, if you develop a website, you can get hundreds of requests per second (RPS) from online users on these resources and store millions of rows (documents) in a database with a disk of less than 10GB.

Queries to the database were very fast. About 2-100 ms to a collection with 100k documents and 3-300 ms to a table with 3 million documents. Not enough resources for 3 million documents, but it kept running at a speed acceptable to a live user.

Let's delete the index in the collection with 100k documents

I opened MongoDB Compass and removed the index on the timestamp field. One of the queries was sorting the collection by this field.

Performance has degraded dramatically:

The number of queries has dropped from 1.8k to 120 per second.
The load on the CPU has increased to 90%, to the limit.
The query time went from 100ms to 20+ seconds on average.

Yes, the app continued to work. If you don't have many users while the app is in development, you probably won't even notice it. But the app was very bad.

Always create indexes for all fields that are used in search and sorting.

Let's create the index

I just created an index that I deleted earlier.

Performance, CPU utilization, and execution time were immediately restored.

Now delete the index in the collection of 3 million documents

Performance has dropped to less than one operation per second.

The Primary node died after about 30 seconds. Then I lost the connection to the second Secondary node, after which the PHP application ended with an error.
It took me about 10 minutes to recover the instances, and I didn't repeat the experiment.

It is gratifying that simple Reboot instances through AWS control panel automatically started all database nodes; they connected to the monitoring and continued to work.

Conclusion one

Always add indexes.

To learn more about them, I recommend taking the excellent MongoDB performance course from MongoDB:
M201: MongoDB Performance

Let's try to use all the ReplicaSet nodes

In the experiment, I only queried the Primary node, but I have a ReplicaSet with three nodes.

I set Read Preference for read operations from Secondary Nodes.

I just used the parameter when initializing the database client in my application

 'readPreference' => 'secondaryPreferred'

As a result, the initialization looks like this

$app['db_client'] = new \MongoDB\Client(MONGODB_URLS,
   [
       'username' => MONGODB_USER,
       'password' => MONGODB_PASSWORD,
       'replicaSet' => 'MongoDB-RS',
       'authSource' => 'admin',
       'readPreference' => 'secondaryPreferred'
   ],
);

As a result, all write operations are automatically executed in Primary, and read operations in Secondary nodes.

So, without increasing the number of resources, using t2.micro instances I got:

1k ops/sec read operations on each node;
over 500 ops/s of writes and updates (insert, update, delete).

The CPUs of each instance were less than 50% loaded, and I just didn't need more performance for my application.

I repeated the experiment with deleting the index in the collection where the write was done. I had another Docs collection with 100k+ documents. Then I restored the index.

And also got a lot of performance degradation.

Then I loaded the Primary node a little bit more.

Conclusion two

Think about performance, experiment, and keep an eye on monitoring.

It's not complicated or time-consuming, and various free tools are available now.

Top comments (3)

Saravanan Gnanaguru • May 8 '23

Awesome insightful blog. Thanks for posting.

Vladas Saulis • May 23 '23

I couldn't imagine that someone would be working with database and not using indexes! Wow! This "wow" is not for article, but for that "someone" :)

Daniil Bazhenov • May 24 '23

I thought so, too, until I met two friends with this case. They are beginner developers and thought more about the app, installed the database, connected and focused on development.

In one case, the developer just added a new query, with a select field that had no index, and did not even notice the difference, at first.

DEV Community

How Database Indexes Affect MongoDB and Application Performance

About the application and the database

Let's start the experiment

Let's delete the index in the collection with 100k documents

Let's create the index

Now delete the index in the collection of 3 million documents

Conclusion one

Let's try to use all the ReplicaSet nodes

Conclusion two

Top comments (3)

Read next

Terraform vs AWS CDK: ¿Qué herramienta de infraestructura como código es mejor para tu proyecto?

SvelteKit VS Astro. laidback side by side

React 19: A New Era of Web Development

Essential Use Cases to Master API Development and Integration