Luis Sena

Posted on Aug 19, 2021 • Originally published at luis-sena.Medium on Aug 16, 2021

Benchmarking Different Methods For Full-Text Search Using Elasticsearch

#development #softwaredevelopment #elasticsearch #programming

How to choose between different analyzers and queries to get the best search performance? Benchmarking of course!

Deploying a large-scale full-text search engine can be very hard. Elasticsearch makes the job much easier but it’s not one size fits all — quite the contrary.

Elasticsearch has many configurations and features, but having many features also means many ways to achieve the same goal and it’s not always straightforward to know what’s the best way for the product you’re building.

Let’s start with finding out the main ways we can find users by their username/name, measuring their performance, advantages, and drawbacks.

Experiment Stats

Match Query

This will match terms using a fuzziness param.

Pros

Simple to use
Doesn’t use much space
Allows fuzzy search

Cons

If the size of the indexed word is bigger than the searched term+fuzziness_size it will not match
Fuzzy search can slow things down

Prefix query

Pros

Simple to use
Potentially very fast (especially if you use index_prefixes option)

Cons

It will only match if the indexed term starts with the searched term
If you use the index_prefixes option, it will use more space
No fuzzy search

Wildcard query

Works a bit the same way as “LIKE %term%” when using a relational database SELECT.

Pros

Easy to implement and debug

Cons

Usually, the slowest option, especially if the wildcard is placed at the start or very few characters are used

Match query + ngram analyzer

Pros

will match even if the search term is in the middle of a word
good search performance
allows having a “fuzzy” search since it will match segments of each word

Cons

specialized analyzer
uses more disk space
only matches if the search term is at least the size of the smallest “gram”

Mappings

Standard

Ngram

Queries

Match query

Prefix query

Wildcard query

Match query + Ngram Analyzer

Query Benchmarks

To do the benchmarks, I’ve created a small python script that uses 4 parallel processes that will each run 1000 consecutive queries.

It runs that for each kind of query.

The main objective is not to know how long each query takes but to compare their execution time under the same conditions.

Time in seconds is calculated summing the time of 1000 runs and then doing the average between 4 parallel processes

Conclusions

Avoid the wildcard query at all costs: I see the wildcard query being recommended everywhere but as we saw, it is the slowest option and you can get better results with the other options.
If you can live with matching only the beginning of a word: The prefix query can do this job, and it can do it really fast. If your use case fits this, it’s a good choice. There is also the possibility of using the index_prefix option to speed things up even more at the cost of disk space.
If you want to save on disk space : Using the standard analyzer with a match+fuzziness param should do the trick.
If you want to be able to match even if the search term is in the middle of a word and really need it to be fast : ngram seems to be the choice in this case. It can be “dangerous” to use it sometimes though.

When using the ngram analyzer , you should avoid having a big distance between min and max gram size and also avoid using very small ngram sizes like 1 to allow showing results when using only 1 letter.

If you have a big range of gram sizes, it will become very expensive disk-wise and potentially degrade your performance.

Instead, you could, for example, use the fields that use the standard analyzer and perform a simple match or prefix query when your search_term < min_ngram_size.

Into Elasticsearch? Check these out:

How does this all sound? Is there anything you’d like me to expand on? Let me know your thoughts in the comments section below (and hit the clap if this was useful)!

Stay tuned for the next post. Follow so you won’t miss it!

DEV Community

Benchmarking Different Methods For Full-Text Search Using Elasticsearch

Experiment Stats

Match Query

Prefix query

Wildcard query

Match query + ngram analyzer

Mappings

Queries

Query Benchmarks

Conclusions

Into Elasticsearch? Check these out:

Top comments (0)