Elasticsearch does so much to improve performance without us ever noticing, what can we do to improve it even further? This is what I asked myself when looking into the performance of some heavy aggregations we are using. In this post, I give a basic explanation of caching in Elasticsearch, followed by two experiments which verify how caching and queries interact.
How does Elasticsearch cache?
Elasticsearch has different levels of caching that all work together to make sure it responds as fast as possible. All caching levels have the same promise: near real-time responses. That means that the response you get is both fast and matches (or almost matches) with the data as it is currently present in the index.
Request cache
Elasticsearch has its own intelligent request cache. It will update this cache based on updates to the underlying index, which makes sure the cache is always accurate. Of course there are some gotcha's, for example:
Requests where size is greater than 0 will not be cached even if the request cache is enabled in the index settings.
Other reasons that request cache might not work is when your response contains a value that changes on every request. For example, if your response contains the current date or some randomly generated number, that will make the response uncacheable.
If you want to learn more about how you can tweak the request cache, take a look at the documentation.
Query cache
On a deeper level, the results of filter type queries can be cached to a binary representation called a bitset. Just like the request cache, this cache is updated automatically whenever something relevant in the index gets updated. Elasticsearch only caches queries which apply to a large number of documents:
Only segments that hold more than 10,000 documents (or 3% of the total documents, whichever is larger) will cache the bitset.
It does this, because for smaller segments it's probably faster to evaluate the query. With how much Elasticsearch has already optimized performance without cache, it's quite easy to make things slower by adding cache. You can find more information about query cache here.
Field data cache
Field data cache is very relevant for aggregations. As the documentation puts it:
It loads all the field values to memory in order to provide fast document based access to those values.
Not having enough memory reserved for field data cache will make your aggregations slow. You can monitor usage of field data cache and tweak it to your needs. Go here to learn more about it.
Does it help to extract common query elements?
The query cache seems like it would be very benificial for a lot of real world aggregations. It is very common to perform some aggregations on a filtered subset of your index. Can Elasticsearch re-use the filtering in this case? Or can we help it do that?
Let's compare the performance of the following queries. The first query has the same filter specified for both aggregations separately:
{
"size": 0,
"aggregations": {
"1": {
"filter": {
"match": {
"search_field": "text"
}
},
"aggregations": {
"items": {
"top_hits": {
"size": 100,
"_source": {
"includes": "field1"
}
}
}
}
},
"2": {
"filter": {
"match": {
"search_field": "text"
}
},
"aggregations": {
"items": {
"top_hits": {
"size": 100,
"_source": {
"includes": "field2"
}
}
}
}
}
}
}
The second query has this filter extracted to a higher level, which should make the aggregations share the results. We need to wrap the filter in a bool.filter to make sure that scoring is the same:
{
"query": {
"bool": {
"filter": [
{
"match": {
"search_field": "text"
}
}
]
}
},
"size": 0,
"aggregations": {
"1": {
"top_hits": {
"size": 100,
"_source": {
"includes": "field1"
}
}
},
"2": {
"top_hits": {
"size": 100,
"_source": {
"includes": "field2"
}
}
}
}
}
We disabled the request cache for this test, but the query cache and field data cache could still do their jobs. We've made sure the segment for the filter query is actually bigger than 10,000 documents. This means that the query cache should kick in for this, and there should be no difference in query times between these two queries. That's exactly what we see:
There's no performance difference between these two solutions. It looks like query cache is doing its job well. Keeping in mind the requirements for the query cache, you would probably still prefer the second variation. Field data cache is also working equally well in both cases.
Do aggregations run in parallel?
In day to day work, I see a lot of cases where we put a large number of aggregations in one query. This got me wondering, do aggregations actually run in parallel? Or could we improve response time by, for example, doing an msearch with a query per aggregation?
In this test we run the same query as before. We test with 1, 2, 5 and 10 aggregations in one query. We compare that to when we split up the aggregations, so each aggregation gets its own query.
We see a significant performance increase when we give each aggregation its own query and do an msearch. At 10 aggregations, the speedup was close to a factor 2. In this test, the Elasticsearch instance ran inside a docker container with 2 available CPU's, so this speedup is about the best you could expect to get.
It's clear that aggregations don't just run in parallel by default. Therefore, splitting up your aggregations into multiple queries might make sense if you want to improve response time. This only applies when CPU is not a bottleneck, because by splitting your query up you'll be using more CPU time in total.
Conclusion
So, does it help to extract common query elements? In general, you should. You don't need to, because Elasticsearch can optimize for these cases. In cases where your filter isn't eligble for the query cache, moving common query elements higher up in your aggregation might still improve performance a bit.
And do aggregations run in parallel? They don't by default. Splitting them up using an msearch might be smart, as long as you're not CPU limited yet. On a cluster that's not fully utilized yet, this can improve response times significantly.
Top comments (5)
Does query complexity make a huge impact?
I see you ran a simple text search within filter scope. How about cases when you have bool queries within bool queries or even nested ones?
This was a really nice article though, I never considered running aggregations using
msearch
๐Yes actually, from some of the real world queries I've seen, the complexity of the query has a significant impact on aggregation performance.
We once enhanced one of our queries, which at first had a couple of match clauses. We added another one, which matched based on a field with shingles and trigrams applied. This made aggregations applied on that subset more than 2 times slower. How much impact this would have for you depends a lot on what you're actually doing in your query of course.
Thanks, glad you liked it!
Hey Raoul, thanks for the paper! it's really helpful
May I have two questions?
Glad to hear it's helpful!
For your questions:
I measured the time it took in a small script, then put the results into a Google spreadsheet to make the graphs. I'm not sure if you could do something similar with Kibana, could definitely be that there is an easier way to do it.
I tried two different variations of queries which both looked something like this:
I then put 1, 2, 5, 10 aggregations in one query like this, and compared it to using only a "firstAggregation" in every query and sending 1, 2, 5, 10 queries.
I hope that helps, let me know if you have any questions or if anything is still unclear!
Thank you for an insightful article!