I will show with real examples and graphs how indexes can affect the performance of the database and your application.
This article was inspired by a lot of feedback from my friends saying that MongoDB is slow or crashing. Indeed, when I started working with MongoDB, I noticed very quickly that it starts working slowly, although there were few documents in the database. MySQL was faster in the same situation during development with no load. I just didn't add any indexes.
As an experiment, I will run an application that will make lots of queries to different collections in the database. In the experiment, I will show that it can run fast and slow and even crash entirely because of our mistakes, just like any tool. At the beginning of the experiment, all queries are executed using indexes. Then I'll remove the index for a while, show how it affects it. And I'll create it again to restore the performance.
The application is developed in PHP 8 and uses the default MongoDB driver for database queries.
The application is deployed on a t2.micro (Free Tier) instance in AWS, with 1 vCPU and 1 GiB of Memory.
The application makes queries to several collections in a loop:
- Pages - 3.2 million documents
- Users - 130k documents
- Docs - 100k documents
Percona Server for MongoDB is used as the database. MongoDB ReplicaSet consists of three nodes also deployed in AWS, each node on a separate t2.micro instance.
I use the free, open-source tool Percona Monitoring and Management (PMM) for monitoring and graphing.
These limited resources are available to everyone for free, and you will see how much you can get out of them.
I ran the application to do the load on the database. All queries in this experiment were performed to the Primary node to simplify the demonstration. And also in the final section of the article I will make an experiment with the load on all three nodes.
The loop performed several FindOne queries
- Get a document from a collection sorted by timestamp
- Get an random document by id.
Queries were run against two different collections (with 3 million and 100k documents)
$last_user = $app['db']->users->findOne(, [ 'sort' => [ 'timestamp' => -1 ] ]); $user_id = rand(1, $last_user['user_id']); $user_data = $app['db']->users->findOne( [ 'user_id' => $user_id ]);
When I ran one process, I got about 1,000 queries per second and 30% of the CPU load of the database instance.
Then I started the second similar process. I got about 1.8k queries per second and just over 50% CPU load.
I was shocked by these results because these are very big numbers. For example, if you develop a website, you can get hundreds of requests per second (RPS) from online users on these resources and store millions of rows (documents) in a database with a disk of less than 10GB.
Queries to the database were very fast. About 2-100 ms to a collection with 100k documents and 3-300 ms to a table with 3 million documents. Not enough resources for 3 million documents, but it kept running at a speed acceptable to a live user.
I opened MongoDB Compass and removed the index on the timestamp field. One of the queries was sorting the collection by this field.
Performance has degraded dramatically:
- The number of queries has dropped from 1.8k to 120 per second.
- The load on the CPU has increased to 90%, to the limit.
- The query time went from 100ms to 20+ seconds on average.
Yes, the app continued to work. If you don't have many users while the app is in development, you probably won't even notice it. But the app was very bad.
Always create indexes for all fields that are used in search and sorting.
I just created an index that I deleted earlier.
Performance, CPU utilization, and execution time were immediately restored.
Performance has dropped to less than one operation per second.
The Primary node died after about 30 seconds. Then I lost the connection to the second Secondary node, after which the PHP application ended with an error.
It took me about 10 minutes to recover the instances, and I didn't repeat the experiment.
It is gratifying that simple Reboot instances through AWS control panel automatically started all database nodes; they connected to the monitoring and continued to work.
Always add indexes.
To learn more about them, I recommend taking the excellent MongoDB performance course from MongoDB:
M201: MongoDB Performance
In the experiment, I only queried the Primary node, but I have a ReplicaSet with three nodes.
I set Read Preference for read operations from Secondary Nodes.
I just used the parameter when initializing the database client in my application
'readPreference' => 'secondaryPreferred'
As a result, the initialization looks like this
$app['db_client'] = new \MongoDB\Client(MONGODB_URLS, [ 'username' => MONGODB_USER, 'password' => MONGODB_PASSWORD, 'replicaSet' => 'MongoDB-RS', 'authSource' => 'admin', 'readPreference' => 'secondaryPreferred' ], );
As a result, all write operations are automatically executed in Primary, and read operations in Secondary nodes.
So, without increasing the number of resources, using t2.micro instances I got:
- 1k ops/sec read operations on each node;
- over 500 ops/s of writes and updates (insert, update, delete).
The CPUs of each instance were less than 50% loaded, and I just didn't need more performance for my application.
I repeated the experiment with deleting the index in the collection where the write was done. I had another Docs collection with 100k+ documents. Then I restored the index.
And also got a lot of performance degradation.
Then I loaded the Primary node a little bit more.
Think about performance, experiment, and keep an eye on monitoring.
It's not complicated or time-consuming, and various free tools are available now.