At work, our centralized logging Elasticsearch cluster recently experienced a "mapping explosion" that led to performance issues and ultimately an outage. This happened because we left dynamic field mapping enabled on our logging indices with no field count limit. In response, we decided that disabling dynamic field mapping was the right course of action. But now we need to know if there are any "legit" fields being used that weren't part of the Index Template.
What is a mapping explosion?
Normally our application logs have anywhere from 6-12 fields on them. Between our various applications we have around 25 fields in our standard Index Template. We also had dynamic field mapping enabled. This means that if a document/log line gets ingested with a field that isn't defined in the Index Template Elasticsearch will pick the field type based on some defaults or dynamic mapping rules on the index.
For an unknown reason our Production logging Elasticsearch cluster started receiving logs with log text values as field names. This led to the log indices getting overloaded with fields, some growing to over 600 fields. We initially responded to Elasticsearch data nodes repeatedly crashing. The nodes were already configured with 32GB of Java heap space, and it was running out within minutes. With the help of Elastic Support it was determined that the "explosion" of fields on the indices was causing the excessive heap usage.
Here's some index field counts from before and during the issue:
{"index_name":"logs-20200427","index_field_count":25}
{"index_name":"logs-20200428","index_field_count":27}
{"index_name":"logs-20200505","index_field_count":626}
{"index_name":"logs-20200506","index_field_count":533}
In order to get the data nodes to stay running for more than a few minutes we ended up having to delete the worst offending indices. We also disabled dynamic field mapping to prevent this from being an issue again in the future.
Dynamic mapping is disabled, so are we now missing data?
Not every field being logged was included in the index template. So now that dynamic mapping is disabled, those fields are no longer searchable. I did some searching and couldn't find an easy answer for getting a list of all fields, including unmapped ones, directly from Elasticsearch. Asking on discuss.elastic.co led me down the path of using a scripted query and scripted terms aggregation. The end result is a list of all fields on the index, mapped or not.
Accessing un-mapped fields
The "raw" document sent to Elasticsearch is kept under the _source
field when indexed. _source
maintains every field, whether or not it was mapped into a searchable field type. The data in _source
can't be queried with the usual Elasticsearch query types, like terms
or match
. But it can be queried using a script.
In a script field or script in an aggregation _source
can be accessed via the params
object. To find out where to go from here, I used the built-in Painless debugging capability. Here's an example:
GET _search
{
"script_fields": {
"test": {
"script": {
"source": "Debug.explain(params._source)"
}
}
}
}
An abbreviated response:
{
"shard" : 0,
"index" : "logs",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"java_class" : "org.elasticsearch.search.lookup.SourceLookup"
}
}
This shows that params._source
is of type org.elasticsearch.search.lookup.SourceLookup
, and SourceLookup
doesn't appear to be documented in the official Painless API Reference. But I was able to find more information using javadocs.io. SourceLookup
is a wrapper around java.util.Map
so using the keySet()
method returns a list of all field names (keys) in _source
. This is exactly what I wanted!
Getting the list of fields
The first step I took was writing a query using a script field to get the list of all field names from _source
. Now each document that matches the query will also have the list of field names.
Here's what that query looks like:
{
"query": {
"range": {
"@timestamp": {
"gte": "now-10m",
"lte": "now"
}
}
},
"script_fields": {
"field_names": {
"script": {
"source": "params._source.keySet()"
}
}
}
}
I'm using a range
filter to limit the search to documents from the past 10 minutes. This is because scripted queries and fields can be expensive to run over large collections.
This is great, but it gives a list of field names per document. It would be even more useful to get a single list of all field names across all documents. Using the same script with a terms
aggregation does the job.
Here's what I used:
{
"size": 0,
"query": {
"range": {
"@timestamp": {
"gte": "now-10m",
"lte": "now"
}
}
},
"aggs": {
"fields": {
"terms": {
"size": 100,
"script": {
"source": "params._source.keySet()"
}
}
}
}
}
Now I have a list of the top 100 field names used in the past 10 minutes. I can compare this against the list of fields we already have defined in our Index Template and add any that are missing. Problem solved!
Huge thanks to my teammates for their review and feedback on this article!
Top comments (4)
Field mapping explosion is a bad problem to have!
Glad you were able to move past it :).
Painless scripting is CRAZY powerful.
If you find yourself using painless more in the future, I suggest looking into the Painless lab in Kibana
That looks awesome, thanks for sharing! We're not on 7.8 yet but it's one more reason to upgrade.
Interesting...thanks for the post. I would have thought 600 fields would be quite manageable. The Elastic Common Schema for consolidated logging has this sort of quantity, if I recall correctly. Do you think the issue was the overall quantity or that there was so many dynamic ones?
To be honest, I'm not entirely sure. Some of the field names themselves were also quite long, up to the 255 character max. So it may have been a combination of that and the number of fields that led Elastic Support to determine that it was the cause of the heap space use.