I will tell you the real story of using database monitoring tools when developing an application. I will show you an example of how I managed to detect and fix a problem in the application.
I am developing a PHP application using MongoDB as a database. The application is lightweight, and most load falls on the database. I have implemented functions at the application level to adjust the number of queries, as the application can quickly load the database to 100%.
For development, I use several small dev instances in AWS, use Percona Server for MongoDB with three nodes as a database, and have PMM installed to monitor the databases.
As a result, my development process consists of the following steps:
- I developed a new feature and ran it on the dev server for testing.
- I check the profiling on the PHP side, and there is no memory leak, and I am happy with the speed.
- I check the database monitoring to ensure everything works fine.
- I debug the feature, setting the number and types of queries in the function to balance the number of queries and the load on the database, if necessary.
So I started the application and got ready to run the new feature. The feature was getting information from open sources, processing it, and saving it to the database. The second part of the functionality went through all the saved documents and did some additional processing.
At this point, the application already had a lot of features that loaded the CPU of the Primary Node by 25-40%, and everything was running stably. I decided to have a performance reserve, as I planned to add new features.
I checked several dashboards, and there were no anomalies or changes. PMM has many dashboards and charts, and I will only show a few, just some.
I saved the changes with the new feature and pushed it to the dev server to make it work. Then I checked that the function started without errors, and the result was visible in the database. I use MongoDB Compass to check the result of a database entry.
I waited a few minutes and rechecked the dashboard. At first glance, the main screen was fine. However, I was alarmed by the speed of processing. The number of operations has mostly stayed the same.
I scrolled down through the various charts on the dashboard and saw an anomaly.
The latency increased, and the app loaded the instance to 100% CPU.
I have made a test run on the application side and checked the profiler there, too. The app worked poorly, and queries were slow.
I knew the reason was the new feature and immediately rolled back the last changes.
I had a rough idea of where the problem might be, made a few changes, and started again.
I did it several times, but the result was the same (the CPU was loaded at 100%).
I selected a period with a load and used the Query Analytics function built into the monitoring.
Query Analytics shows a list of queries sorted by load or execution speed. Some of the queries to the Pages collection gave 90% load, and the Query Time was more than 3 minutes.
In Query Analytics, you can find slow queries, see their details, and then debug them in the application.
I made a few changes that fixed the problem.
The first problem was the indexes. I create indexes from within the application using the command.
$app['db']->CollectionName->createIndex(['index_key' => 1]);
Since the application uses many different collections and queries with conditions on various fields and with or without sorting, I have a lot of indexes.
I made a typo in this case, and the index was not created correctly.
After the indexes were created correctly, I needed quick runs to debug the number of queries to adjust the CPU load to around 50%.
You can see the final chart after debugging and fixing the problem.
It is enough to install locally or on a server PMM Server and PMM Client on each database instance. Documentation
It also takes 60-360 minutes to get used to it, as it has many features and dashboards.
Don't forget to add indexes and make sure they work.
I am a simple developer who can make mistakes and do different experiments. Installing the monitoring was one of the experiments, and previously I just focused on the speed of the PHP script. From time to time, I have looked at the monitoring dashboard in the AWS control panel, but it gives less information, only about the instance itself, without being able to investigate in detail.
If you use MySQL, PostgreSQL or MongoDB, try installing PMM and see how your database works. Installing PMM for development purposes took a limited amount of resources and was straightforward. It benefited me because I discovered and fixed the problem before it went into production.
A small clarification, the story is real from my development practice, but for the article I took the graphs of the final debugging, so that the graphs show the right sequence and fit into the image available for explanation and demonstration. Just in reality I went to drink coffee several times and thought for a long time, what was reflected in the graphs of monitoring :)