Many interviews I passed (or not passed) included questions on the main solution language (say, Java) and about databases (often relational, SQL databases).
One of the popular and somewhat vague question is:
Have you ever met with query performance optimization tasks? How have you solved them?
As question is too broad, generally it is just important not to lose yourself and answer like this:
- identify the troublesome query (if it is not obvious)
- find which part of the query causes slowness (perhaps with
explainbut this may be not necessary)
- detect why it is slow (often it is just lack of index on certain field)
- fix it, if possible.
But such problems are rare as generally we don't create zounds of new queries everyday. Nevertheless few days ago I encountered problem on my own web-site. It is different case - performance regression with time.
I've noticed it takes over 10 seconds to load, though I definitely remember it worked all right in the past, and I haven't touched most of functionality for a year or more.
Firstly we need to identify the culprit. Page is just generated with PHP which loads some data and calculates some stats (on the right) from the MySQL database. As other pages work well, most probably it is about SQL query.
How to prove it? Here are two SQL queries made by web-page template (one for log, another for stats). I commented out second (it is more complicated), then first - replacing their results with empty arrays.
Funny. The problem disappears when I comment the easiest query - returning "recent actions" from log.
Second step Now we know the bad query. Let's look at it. We have two tables - one with
users, another with
actions. Like this:
create table user(id int primary key, username varchar, ...); create table actions(userid int, ts timestamp, message varchar);
The query simply joins them by user-id, so for each log message we get the username and link to user's profile. And takes latest 100 messages. Like this:
select username, ts, message from user join actions on id = userid order by ts desc limit 100;
Surely problem is not in join. What then? Do we need
explain? No, as I warned, these bugs though look mysterios first, generally are very simple.
How does this query with join work? Obviously it first applies
actions to get last 100 events - and only then does join.
Third step So problem can happen in two places:
- either we don't have index on
user.idand a lot of users, so join is browsing through whole user table (but this is unlikely as user ids are used in many other places);
- or we don't have index on
tstimestamp field, so ordering and taking top requires browsing whole
Of course it was the latter. I added index and everything is fine. (Fourth step).
Now to analyze - why did this happen? Was I so silly not to add index on timestamps? Perhaps... But I vaguely remember that initially the code was cleaning up messages older than 1 day automatically. So there was no need of such index. However later I removed this cleanup, deciding it is good to store activity for good - so that in case with problems of some other tables it may be easy to track which tasks user have solved and fix data at least partially (this happened once).
But at this point of course it didn't occurred to me to check that the index is missing and performance downgrade may occur after some years :)