We developers, most of the time, are having a battle with databases about performance. It is not rare that the root of a performance problem is ingrained deep inside your query. It is a fate we must fight against everyday. This week I had to fight against it and I had some interesting results and new experiences to talk about. That was when I discovered the magic called MySQL FULLTEXT index.
When you work in a small company we end up venturing yourself in fields that it is not necessarily your specialty, that's me as a developer dealing with databases. I'm not a DBA and for that reason I don't know all the intricacies of a database like MySQL. When a problem comes up everybody helps giving their suggestions, researching and etc. This time the problem was about performance in a search feature of an application.
Giving a similar example of the use case, we have a search feature that looks for products. As expected, there are many fields in a product where you need to search to make it as accurate as possible, some of them are the name of the product, its description, info fields, some keywords, the name of the manufacturer, etc. Only with this description we can have three tables:
- Products, self-explanatory
- Keywords, it has a separate table because we can reuse for multiple products.
- Manufacturer, also self-explanatory.
However, the problem here was not the number of joins (there are many, more than the ones I describe in the simpler example above), but the way the fields were searched for a given search input. Initially we were using a function on the database that made use of REGEXP to look for rows that contained the search input entirely, i.e, if the input was "discount" the column must have the entire word "discount" somewhere to be brought by the database. Okay, looks inoffensive, until the moment it is not.
I still don't know why, I think it would be a good reason to research about and make another post another time (if you have any idea why let me know!), but after divide every part of the query and test all of them we reached the conclusion that the use of this search method was slowing down the query enormously, like, they query was taking many seconds in a production environment, very bad news.
To give some more context so that it doesn't look like it was a big mistake from our part, the system is being rebuilt, it has been running for many years and now we are reshaping the layout and practically rewriting the backend with new technologies and solutions, but we are still using the same database and many queries were not touched yet. Besides, the problem appeared recently, maybe because of the increased traffic. During tests and early months of production it was not detected, the search was performing reasonably well although we had plans to speed it up even more.
Solve the problem was not that easy due to our time and budget limitations, it would be impossible to do something outside of what we already had of infrastructure, for example, add another database just for that: we thought about Elasticsearch or even any other NoSQL database like MongoDB, some tests indicated that it would give a good performance. However, add another database to the mix added complexity, not to mention cost. Since we are rebuilding we are willing to try new technologies, but we have added Redis for caching purposes and add another database would raise the costs too much.
With this limitation in mind the only thing we could was trying to make the query better in some way. The first thing we tried was to replace the REGEXP for LIKE and although the performance improved it didn't bring the search results we were expecting. At the time I didn't know FULLTEXT index existed and didn't think that MySQL had such feature, I always relate full-text search with Elasticsearch and other NoSQL databases which has this as main feature. It was when I looked for "mysql fulltext" on friend Google and it came with interesting results.
Like any other index you start by creating it, for example:
CREATE FULLTEXT INDEX product_search_index ON product(name, description)
And then the magic begins. To use the capabilities of the FULL TEXT INDEX you can use the
MATCH ... AGAINST ... function, there are some modes and you choose the one best suited for your needs. I'm still studying myself about all the modes and their advantages and disadvantages, but you can find all the explanation in the documentation, for now I will stick with
IN NATURAL LANGUAGE MODE, the default one and the one that best suited for my use case, coincidentally. Quoting the documentation this mode works like this:
A natural language search interprets the search string as a phrase in natural human language (a phrase in free text). There are no special operators, with the exception of double quote (") characters. The stopword list applies.
It looks for the words we are looking for as a phrase in free text. MySQL has a stopwords list that can be configured for your needs to improve the results. Then you can make a simple query to bring the results:
SELECT * FROM product WHERE MATCH(name, description) AGAINST ('notebook off discount')
The query, then, will bring the results of the search in order of relevance. Furthermore, it is fast, really fast comparing to what we have before. We brought down the query response time from many seconds to less than 1 second most of the time. Everybody was surprised, including me, about the results. It was like magic and this makes me think I need to study this more to understand how they do this.
It is impressive like everyday you can learn something new and sometimes it can turn out to be surprisingly the solution you were looking for. With that I was able to solve the problem with only the resources I already had and ultimately not overengineering the solution.