Jessica Garson

Posted on Jun 17, 2024

Choosing between an index-level API, a query, an aggregation, or ES|QL in Elasticsearch

#elasticsearch #beginners

When getting started with Elasticsearch, something that needs to be clarified is figuring out when to use an index-level API, a query, an aggregation, or ES|QL can be tricky. This blog post aims to walk you through when to use which.

At a high level, you can think about the differences as follows:

Index-level APIs help manage your index. They allow you to create, delete, and modify your index settings, mappings, and aliases.
Queries help retrieve data that meets specified criteria using a JSON-based query language (Query DSL).
Aggregations perform functions such as calculations and grouping data. They are accommodating for data analysis.
ES|QL is a procedural piped query language with SQL-like syntax, useful for data filtering and analytics.

You can run the examples from this blog post inside Elastic’s Dev Tools Console. In the Dev Tools Console, you can directly invoke Elasticsearch’s REST APIs without needing to supply additional authentication parameters inside the Dev Tools Console.

When to use an index-level API

An Elasticsearch index is a data structure containing a set of documents. Each document in an index contains key-value pairs that store your data. An index-level API works with the index as a whole instead of individual documents or a cluster. Index-level APIs enable you to manage your index, settings, aliases, mappings, or templates. The documentation on the subject provides a complete list of index-level APIs.

Examples of times when you would use an index-level API include:

Creating a new index.
Deleting an index.
Cloning an index.
Creating an alias for an index.

Creating a new index

To create a new index you would run the following command:

PUT /new_index

The response that gets returned lets you know that an index called new_index has been created.

{
 "acknowledged": true,
 "shards_acknowledged": true,
 "index": "new_index"
}

Deleting an index

For testing purposes, it's common to create multiple indexes. Having the ability to delete these indexes can be very useful in cleaning up these indexes.

To delete an index, you would use the following syntax:

DELETE /new_index

The output confirms that the index has been deleted successfully.

{
 "acknowledged": true
}

Cloning an index

Cloning an index can be helpful for backup and recovery or data archiving purposes. Our documentation on the subject provides more information on the clone index API.

First, you will want to prevent write operations on the index using the add index block API.

PUT /new_index/_block/write

The output confirms that you have added a block to prevent further write operations on your index.

{
 "acknowledged": true,
 "shards_acknowledged": false,
 "indices": []
}

To clone an index called new_index_cloned from new_index, you would use the following syntax:

POST /new_index/_clone/new_index_cloned

The output indicates that you have created a new index called new_index_cloned.

{
 "acknowledged": true,
 "shards_acknowledged": true,
 "index": "new_index_cloned"
}

Creating aliases

Creating aliases can be helpful for index management. They allow you to refer to an index by a more intuitive and usually shorter name. The following snippet creates an alias for new_index called new.

POST /_aliases
{
 "actions": [
   {
     "add": {
       "index": "new_index",
       "alias": "new"
     }
   }
 ]
}

The output confirms that an alias has been created.

{
 "acknowledged": true
}

When to use queries

While index-level queries help manage your index as a whole, queries help search and retrieve documents that meet the criteria you define. The language used for creating queries in Elasticsearch is called Query DSL (Domain-Specific Language). Query DSL employs JSON. It is beneficial for getting back documents that match the specifications you create inside your query. Some of the most used queries include match, term, and range queries. You also can combine queries to reach a greater granularity using a boolean query.

Match query

A match query in Elasticsearch retrieves documents that correlate with a given value. Match queries are handy for full-text searches since they return text containing a specific phrase or value.

If you had an index containing information on the Boston Celtics games and were looking for games in which the Celtics had a plus-minus score of -19, you would use the following query.

GET /celtics/_search
{
 "query": {
   "match": {
     "PLUS_MINUS": "-19"
   }
 }
}

A basketball team's plus-minus score is a statistic that measures the point differential when a specific team is on the court. It calculates the difference between the team's points and those scored by its opponents while that team is playing. A positive plus-minus indicates the team outscored its opponents, while a negative plus-minus indicates they were outscored.

The result that would get returned contains the one game where the Boston Celtics had a plus-minus score of -19:

{
 "took": 1,
 "timed_out": false,
 "_shards": {
   "total": 1,
   "successful": 1,
   "skipped": 0,
   "failed": 0
 },
 "hits": {
   "total": {
     "value": 1,
     "relation": "eq"
   },
   "max_score": 1,
   "hits": [
     {
       "_index": "celtics",
       "_id": "0022300646",
       "_score": 1,
       "_source": {
         "SEASON_ID": "22023",
         "TEAM_ID": 1610612738,
         "TEAM_ABBREVIATION": "BOS",
         "TEAM_NAME": "Boston Celtics",
         "GAME_ID": "0022300646",
         "GAME_DATE": "2024-01-27",
         "MATCHUP": "BOS vs. LAC",
         "WL": "L",
         "MIN": 240,
         "PTS": 96,
         "FGM": 36,
         "FGA": 100,
         "FG_PCT": 0.36,
         "FG3M": 10,
         "FG3A": 40,
         "FG3_PCT": 0.25,
         "FTM": 14,
         "FTA": 16,
         "FT_PCT": 0.875,
         "OREB": 18,
         "DREB": 34,
         "REB": 52,
         "AST": 21,
         "STL": 2,
         "BLK": 9,
         "TOV": 11,
         "PF": 13,
         "PLUS_MINUS": -19
       }
     }
   ]
 }
}

Term query

A term query returns an exact match of a specific term, which can be helpful when working with structured data. The following example searches for information about a game on a particular date, April 29th, 2024.

GET /celtics/_search
{
 "query": {
   "term": {
     "GAME_DATE": "2024-04-29"
   }
 }
}

You will get back information about the game that took place on 2024-04-29.

{
 "took": 2,
 "timed_out": false,
 "_shards": {
   "total": 1,
   "successful": 1,
   "skipped": 0,
   "failed": 0
 },
 "hits": {
   "total": {
     "value": 1,
     "relation": "eq"
   },
   "max_score": 1,
   "hits": [
     {
       "_index": "celtics",
       "_id": "0042300104",
       "_score": 1,
       "_source": {
         "SEASON_ID": "42023",
         "TEAM_ID": 1610612738,
         "TEAM_ABBREVIATION": "BOS",
         "TEAM_NAME": "Boston Celtics",
         "GAME_ID": "0042300104",
         "GAME_DATE": "2024-04-29",
         "MATCHUP": "BOS @ MIA",
         "WL": "W",
         "MIN": 240,
         "PTS": 102,
         "FGM": 36,
         "FGA": 86,
         "FG_PCT": 0.419,
         "FG3M": 14,
         "FG3A": 37,
         "FG3_PCT": 0.378,
         "FTM": 16,
         "FTA": 18,
         "FT_PCT": 0.889,
         "OREB": 11,
         "DREB": 35,
         "REB": 46,
         "AST": 21,
         "STL": 5,
         "BLK": 3,
         "TOV": 10,
         "PF": 20,
         "PLUS_MINUS": 14
       }
     }
   ]
 }
}

Range query

A range query in Elasticsearch retrieves documents containing terms within a specified range. While working with range queries, it is helpful to note that gte stands for greater than or equal to, and lte stands for less than or equal to.

The following example looks for Celtics games with total points between 145 and 150.

GET /celtics/_search
{
 "query": {
   "range": {
     "PTS": {
       "gte": 145,
       "lte": 150
     }
   }
 }
}

Two games that fit into this range get returned:

{
 "took": 1,
 "timed_out": false,
 "_shards": {
   "total": 1,
   "successful": 1,
   "skipped": 0,
   "failed": 0
 },
 "hits": {
   "total": {
     "value": 2,
     "relation": "eq"
   },
   "max_score": 1,
   "hits": [
     {
       "_index": "celtics",
       "_id": "0022300542",
       "_score": 1,
       "_source": {
         "SEASON_ID": "22023",
         "TEAM_ID": 1610612738,
         "TEAM_ABBREVIATION": "BOS",
         "TEAM_NAME": "Boston Celtics",
         "GAME_ID": "0022300542",
         "GAME_DATE": "2024-01-13",
         "MATCHUP": "BOS vs. HOU",
         "WL": "W",
         "MIN": 239,
         "PTS": 145,
         "FGM": 51,
         "FGA": 95,
         "FG_PCT": 0.537,
         "FG3M": 24,
         "FG3A": 47,
         "FG3_PCT": 0.511,
         "FTM": 19,
         "FTA": 25,
         "FT_PCT": 0.76,
         "OREB": 10,
         "DREB": 40,
         "REB": 50,
         "AST": 26,
         "STL": 7,
         "BLK": 8,
         "TOV": 11,
         "PF": 21,
         "PLUS_MINUS": 32
       }
     },
     {
       "_index": "celtics",
       "_id": "0022300389",
       "_score": 1,
       "_source": {
         "SEASON_ID": "22023",
         "TEAM_ID": 1610612738,
         "TEAM_ABBREVIATION": "BOS",
         "TEAM_NAME": "Boston Celtics",
         "GAME_ID": "0022300389",
         "GAME_DATE": "2023-12-23",
         "MATCHUP": "BOS @ LAC",
         "WL": "W",
         "MIN": 241,
         "PTS": 145,
         "FGM": 49,
         "FGA": 94,
         "FG_PCT": 0.521,
         "FG3M": 25,
         "FG3A": 53,
         "FG3_PCT": 0.472,
         "FTM": 22,
         "FTA": 28,
         "FT_PCT": 0.786,
         "OREB": 15,
         "DREB": 36,
         "REB": 51,
         "AST": 33,
         "STL": 4,
         "BLK": 5,
         "TOV": 9,
         "PF": 19,
         "PLUS_MINUS": 37
       }
     }
   ]
 }
}

To combine different queries, you can use a boolean query. It returns documents that match the boolean combinations of other queries. A boolean query must contain at least one of the conditions of must, filter, should, or must not.

The following example searches for Celtics games in which they had a plus-minus score of 10, scored between 100 and 130 points, and did not lose the game.

GET /celtics/_search
{
 "query": {
   "bool": {
     "must": [
       {
         "match": {
           "PLUS_MINUS": "10"
         }
       },
       {
         "range": {
           "PTS": {
             "gte": 100,
             "lte": 130
           }
         }
       }
     ],
     "must_not": [
       {
         "term": {
           "WL": "L"
         }
       }
     ]
   }
 }

Four games meet the parameters of the above query will return the following result:

{
 "took": 0,
 "timed_out": false,
 "_shards": {
   "total": 1,
   "successful": 1,
   "skipped": 0,
   "failed": 0
 },
 "hits": {
   "total": {
     "value": 4,
     "relation": "eq"
   },
   "max_score": 2,
   "hits": [
     {
       "_index": "celtics",
       "_id": "0022300920",
       "_score": 2,
       "_source": {
         "SEASON_ID": "22023",
         "TEAM_ID": 1610612738,
         "TEAM_ABBREVIATION": "BOS",
         "TEAM_NAME": "Boston Celtics",
         "GAME_ID": "0022300920",
         "GAME_DATE": "2024-03-09",
         "MATCHUP": "BOS @ PHX",
         "WL": "W",
         "MIN": 240,
         "PTS": 117,
         "FGM": 46,
         "FGA": 94,
         "FG_PCT": 0.489,
         "FG3M": 15,
         "FG3A": 39,
         "FG3_PCT": 0.385,
         "FTM": 10,
         "FTA": 13,
         "FT_PCT": 0.769,
         "OREB": 13,
         "DREB": 30,
         "REB": 43,
         "AST": 29,
         "STL": 7,
         "BLK": 4,
         "TOV": 12,
         "PF": 12,
         "PLUS_MINUS": 10
       }
     },
     {
       "_index": "celtics",
       "_id": "0022300246",
       "_score": 2,
       "_source": {
         "SEASON_ID": "22023",
         "TEAM_ID": 1610612738,
         "TEAM_ABBREVIATION": "BOS",
         "TEAM_NAME": "Boston Celtics",
         "GAME_ID": "0022300246",
         "GAME_DATE": "2023-11-26",
         "MATCHUP": "BOS vs. ATL",
         "WL": "W",
         "MIN": 239,
         "PTS": 113,
         "FGM": 42,
         "FGA": 95,
         "FG_PCT": 0.442,
         "FG3M": 13,
         "FG3A": 47,
         "FG3_PCT": 0.277,
         "FTM": 16,
         "FTA": 20,
         "FT_PCT": 0.8,
         "OREB": 18,
         "DREB": 40,
         "REB": 58,
         "AST": 24,
         "STL": 9,
         "BLK": 3,
         "TOV": 12,
         "PF": 19,
         "PLUS_MINUS": 10
       }
     },
     {
       "_index": "celtics",
       "_id": "0022300194",
       "_score": 2,
       "_source": {
         "SEASON_ID": "22023",
         "TEAM_ID": 1610612738,
         "TEAM_ABBREVIATION": "BOS",
         "TEAM_NAME": "Boston Celtics",
         "GAME_ID": "0022300194",
         "GAME_DATE": "2023-11-15",
         "MATCHUP": "BOS @ PHI",
         "WL": "W",
         "MIN": 239,
         "PTS": 117,
         "FGM": 42,
         "FGA": 88,
         "FG_PCT": 0.477,
         "FG3M": 18,
         "FG3A": 50,
         "FG3_PCT": 0.36,
         "FTM": 15,
         "FTA": 19,
         "FT_PCT": 0.789,
         "OREB": 12,
         "DREB": 33,
         "REB": 45,
         "AST": 23,
         "STL": 7,
         "BLK": 8,
         "TOV": 9,
         "PF": 15,
         "PLUS_MINUS": 10
       }
     },
     {
       "_index": "celtics",
       "_id": "0022300136",
       "_score": 2,
       "_source": {
         "SEASON_ID": "22023",
         "TEAM_ID": 1610612738,
         "TEAM_ABBREVIATION": "BOS",
         "TEAM_NAME": "Boston Celtics",
         "GAME_ID": "0022300136",
         "GAME_DATE": "2023-11-04",
         "MATCHUP": "BOS @ BKN",
         "WL": "W",
         "MIN": 240,
         "PTS": 124,
         "FGM": 43,
         "FGA": 90,
         "FG_PCT": 0.478,
         "FG3M": 15,
         "FG3A": 45,
         "FG3_PCT": 0.333,
         "FTM": 23,
         "FTA": 27,
         "FT_PCT": 0.852,
         "OREB": 10,
         "DREB": 40,
         "REB": 50,
         "AST": 22,
         "STL": 4,
         "BLK": 6,
         "TOV": 11,
         "PF": 17,
         "PLUS_MINUS": 10
       }
     }
   ]
 }
}

When to use aggregations

Aggregations in Elasticsearch allow you to summarize data by creating metrics and using summary statistics. They are beneficial for analytics. There are three types of aggregations in Elasticsearch: metric, bucket, and pipeline. You can also nest aggregations as well.

Metric aggregation

A metric aggregation performs calculations such as a sum or average on a field value.

The following query calculates the average of the total number of points contained in the index.

GET /celtics/_search
{
 "size": 0,
 "aggs": {
   "total_points": {
     "avg": {
       "field": "PTS"
     }
   }
 }
}

A result containing a section titled aggregations, which includes the average number of points, will be returned.

{
 "took": 0,
 "timed_out": false,
 "_shards": {
   "total": 1,
   "successful": 1,
   "skipped": 0,
   "failed": 0
 },
 "hits": {
   "total": {
     "value": 85,
     "relation": "eq"
   },
   "max_score": null,
   "hits": []
 },
 "aggregations": {
   "total_points": {
     "value": 120.2
   }
 }
}

Bucket aggregations

A bucket aggregation groups documents into buckets according to specified criteria. These buckets, or bins, categorize data based on field values, ranges, or other criteria.

To group the Celtics games by months you would use the following query:

GET /celtics/_search
{
 "size": 0,
 "aggs": {
   "games_over_time": {
     "date_histogram": {
       "field": "GAME_DATE",
       "calendar_interval": "month"
     }
   }
 }
}

In the aggregations section of the JSON response, there is a date histogram called games_over_time, grouping documents by month. doc_count indicates the number of documents per month.

{
 "took": 5,
 "timed_out": false,
 "_shards": {
   "total": 1,
   "successful": 1,
   "skipped": 0,
   "failed": 0
 },
 "hits": {
   "total": {
     "value": 85,
     "relation": "eq"
   },
   "max_score": null,
   "hits": []
 },
 "aggregations": {
   "games_over_time": {
     "buckets": [
       {
         "key_as_string": "2023-10-01T00:00:00.000Z",
         "key": 1696118400000,
         "doc_count": 3
       },
       {
         "key_as_string": "2023-11-01T00:00:00.000Z",
         "key": 1698796800000,
         "doc_count": 15
       },
       {
         "key_as_string": "2023-12-01T00:00:00.000Z",
         "key": 1701388800000,
         "doc_count": 14
       },
       {
         "key_as_string": "2024-01-01T00:00:00.000Z",
         "key": 1704067200000,
         "doc_count": 16
       },
       {
         "key_as_string": "2024-02-01T00:00:00.000Z",
         "key": 1706745600000,
         "doc_count": 10
       },
       {
         "key_as_string": "2024-03-01T00:00:00.000Z",
         "key": 1709251200000,
         "doc_count": 16
       },
       {
         "key_as_string": "2024-04-01T00:00:00.000Z",
         "key": 1711929600000,
         "doc_count": 11
       }
     ]
   }
 }
}

Pipeline aggregations

Pipeline aggregations in Elasticsearch perform calculations on the results of other aggregations, allowing for complex data processing and analytics.

You can create a query that calculates the cumulative sum of points scored. First, you will need a date histogram to bucket the documents by date, then a cumulative sum pipeline aggregation to calculate the cumulative total. The following query uses a pipeline aggregation to view the total number of points scored by the Celtics per month.

GET /celtics/_search
{
 "size": 0,
 "aggs": {
   "games_over_time": {
     "date_histogram": {
       "field": "GAME_DATE",
       "calendar_interval": "month"
     },
     "aggs": {
       "total_points": {
         "sum": {
           "field": "PTS"
         }
       },
       "cumulative_points": {
         "cumulative_sum": {
           "buckets_path": "total_points"
         }
       }
     }
   }
 }
}

The aggregations section of the response contains groupings of the total number of points scored by the Celtics organized by month. You can also see the total number of games per month as well.

{
 "took": 2,
 "timed_out": false,
 "_shards": {
   "total": 1,
   "successful": 1,
   "skipped": 0,
   "failed": 0
 },
 "hits": {
   "total": {
     "value": 85,
     "relation": "eq"
   },
   "max_score": null,
   "hits": []
 },
 "aggregations": {
   "games_over_time": {
     "buckets": [
       {
         "key_as_string": "2023-10-01T00:00:00.000Z",
         "key": 1696118400000,
         "doc_count": 3,
         "total_points": {
           "value": 353
         },
         "cumulative_points": {
           "value": 353
         }
       },
       {
         "key_as_string": "2023-11-01T00:00:00.000Z",
         "key": 1698796800000,
         "doc_count": 15,
         "total_points": {
           "value": 1740
         },
         "cumulative_points": {
           "value": 2093
         }
       },
       {
         "key_as_string": "2023-12-01T00:00:00.000Z",
         "key": 1701388800000,
         "doc_count": 14,
         "total_points": {
           "value": 1771
         },
         "cumulative_points": {
           "value": 3864
         }
       },
       {
         "key_as_string": "2024-01-01T00:00:00.000Z",
         "key": 1704067200000,
         "doc_count": 16,
         "total_points": {
           "value": 1915
         },
         "cumulative_points": {
           "value": 5779
         }
       },
       {
         "key_as_string": "2024-02-01T00:00:00.000Z",
         "key": 1706745600000,
         "doc_count": 10,
         "total_points": {
           "value": 1220
         },
         "cumulative_points": {
           "value": 6999
         }
       },
       {
         "key_as_string": "2024-03-01T00:00:00.000Z",
         "key": 1709251200000,
         "doc_count": 16,
         "total_points": {
           "value": 1947
         },
         "cumulative_points": {
           "value": 8946
         }
       },
       {
         "key_as_string": "2024-04-01T00:00:00.000Z",
         "key": 1711929600000,
         "doc_count": 11,
         "total_points": {
           "value": 1271
         },
         "cumulative_points": {
           "value": 10217
         }
       }
     ]
   }
 }
}

Nested aggregations

One advanced feature of aggregations is the ability to nest them, which allows you to perform multilevel data analysis. For example, you can first group your data by a particular field and perform further aggregations within those groups.

An example of a nested aggregation is an aggregation that first calculates the average of points scored and groups by game month and then by the game result (whether they won or lost).

Since the WL field is a text field, field data is disabled for text fields by default because text fields are not optimized for operations that require per-document field data like aggregations and sorting. To fix this, you can use a keyword field instead. Since WL is already a text field, you can add a .keyword subfield.

GET /celtics/_search
{
  "size": 0,
  "aggs": {
    "games_by_month": {
      "date_histogram": {
        "field": "GAME_DATE",
        "calendar_interval": "month"
      },
      "aggs": {
        "results": {
          "terms": {
            "field": "WL.keyword"
          },
          "aggs": {
            "average_points": {
              "avg": {
                "field": "PTS"
              }
            }
          }
        }
      }
    }
  }
}

The response will provide the average points scored each month, further broken down by wins and losses, allowing you to analyze performance trends over time.

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 85,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "games_by_month": {
      "buckets": [
        {
          "key_as_string": "2023-10-01T00:00:00.000Z",
          "key": 1696118400000,
          "doc_count": 3,
          "results": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "W",
                "doc_count": 3,
                "average_points": {
                  "value": 117.66666666666667
                }
              }
            ]
          }
        },
        {
          "key_as_string": "2023-11-01T00:00:00.000Z",
          "key": 1698796800000,
          "doc_count": 15,
          "results": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "W",
                "doc_count": 11,
                "average_points": {
                  "value": 119.45454545454545
                }
              },
              {
                "key": "L",
                "doc_count": 4,
                "average_points": {
                  "value": 106.5
                }
              }
            ]
          }
        },
        {
          "key_as_string": "2023-12-01T00:00:00.000Z",
          "key": 1701388800000,
          "doc_count": 14,
          "results": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "W",
                "doc_count": 12,
                "average_points": {
                  "value": 127.75
                }
              },
              {
                "key": "L",
                "doc_count": 2,
                "average_points": {
                  "value": 119
                }
              }
            ]
          }
        },
        {
          "key_as_string": "2024-01-01T00:00:00.000Z",
          "key": 1704067200000,
          "doc_count": 16,
          "results": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "W",
                "doc_count": 11,
                "average_points": {
                  "value": 123.9090909090909
                }
              },
              {
                "key": "L",
                "doc_count": 5,
                "average_points": {
                  "value": 110.4
                }
              }
            ]
          }
        },
        {
          "key_as_string": "2024-02-01T00:00:00.000Z",
          "key": 1706745600000,
          "doc_count": 10,
          "results": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "W",
                "doc_count": 9,
                "average_points": {
                  "value": 123.88888888888889
                }
              },
              {
                "key": "L",
                "doc_count": 1,
                "average_points": {
                  "value": 105
                }
              }
            ]
          }
        },
        {
          "key_as_string": "2024-03-01T00:00:00.000Z",
          "key": 1709251200000,
          "doc_count": 16,
          "results": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "W",
                "doc_count": 12,
                "average_points": {
                  "value": 124.5
                }
              },
              {
                "key": "L",
                "doc_count": 4,
                "average_points": {
                  "value": 113.25
                }
              }
            ]
          }
        },
        {
          "key_as_string": "2024-04-01T00:00:00.000Z",
          "key": 1711929600000,
          "doc_count": 11,
          "results": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "W",
                "doc_count": 9,
                "average_points": {
                  "value": 117.88888888888889
                }
              },
              {
                "key": "L",
                "doc_count": 2,
                "average_points": {
                  "value": 105
                }
              }
            ]
          }
        }
      ]
    }
  }
}

When does ES|QL come in?

While the structure of a JSON query language is something that you get used to over time, it can be challenging at first. ES|QL is a procedural piped query language with SQL-like syntax. One key feature is that you can define the order in which you wish to return data. The pipes in ES|QL look like |, enabling you to manipulate and transform data step by step. The main uses of ES|QL are for analytics, data manipulation, and other transformations. It also can help work with visualizations in general.

Filtering data with ES|QL

ES|QL allows you to quickly filter data from a dataset to return the results you are looking for. The following query first specifies the desired format for the returned data. The default is JSON, but you can define formats such as txt or csv. For readability, txt was chosen. The FROM field contains an index called celtics. The query then narrows the index down to fields containing the date of the game and whether the team won or lost. It also limits the results to the last ten games from the dataset.

POST _query?format=txt
{
 "query": """
   FROM celtics
   | KEEP GAME_DATE, WL
   | LIMIT 10
 """

The results you get back contain a table-like structure that includes the date of the game and the result, with W being a win and L being a loss.

      GAME_DATE        |      WL      
------------------------+---------------
2024-04-29T00:00:00.000Z|W             
2024-04-27T00:00:00.000Z|W             
2024-04-24T00:00:00.000Z|L             
2024-04-21T00:00:00.000Z|W             
2024-04-14T00:00:00.000Z|W             
2024-04-12T00:00:00.000Z|W             
2024-04-11T00:00:00.000Z|L             
2024-04-07T00:00:00.000Z|W             
2024-04-05T00:00:00.000Z|W             
2024-04-03T00:00:00.000Z|W

SQL-like syntax

You can also utilize the SQL-like syntax for additional filtering capabilities. The following query narrows the index down to games that the Celtics won and returns the fields for the date of the game and the matchup, which includes the two teams that played. It also limits it to the last ten results contained in the index.

POST _query?format=txt
{
 "query": """
   FROM celtics
   | WHERE WL == "W"
   | KEEP GAME_DATE, MATCHUP
   | LIMIT 10
 """
}

The results you get back contain a table-like structure that includes the date of the game and the matchup, which consists of a summary of the two teams that played the given game.

      GAME_DATE        |    MATCHUP   
------------------------+---------------
2024-04-29T00:00:00.000Z|BOS @ MIA     
2024-04-27T00:00:00.000Z|BOS @ MIA     
2024-04-21T00:00:00.000Z|BOS vs. MIA   
2024-04-14T00:00:00.000Z|BOS vs. WAS   
2024-04-12T00:00:00.000Z|BOS vs. CHA   
2024-04-07T00:00:00.000Z|BOS vs. POR   
2024-04-05T00:00:00.000Z|BOS vs. SAC   
2024-04-03T00:00:00.000Z|BOS vs. OKC   
2024-04-01T00:00:00.000Z|BOS @ CHA     
2024-03-30T00:00:00.000Z|BOS @ NOP

Aggregations in ES|QL

You can use ES|QL to quickly find statistics about a given field. The following query uses the exact data for the Celtics to find the average field goal and field goal percentage from the three-point line. It limits the average to include one line to summarize the data contained in the index.

In basketball, the field goal percentage is a key statistic that measures a player or team's efficiency in making shots. It is calculated by dividing the number of successful field goals by the total number of field goal attempts. The three-point field goal percentage is a version of this metric, but it focuses instead only on shots taken from beyond the three-point line.

POST _query?format=txt
{
 "query": """
   FROM celtics
   | STATS AVG(FG_PCT), AVG(FG3_PCT)
   | LIMIT 1
 """
}

The result contains a table-like structure that includes the index's average field goal percentage and field goal percentage from the three-point line.

     AVG(FG_PCT)    |   AVG(FG3_PCT)   
-------------------+-------------------
0.48734117781414704|0.38770588215659646

Working with visualizations

One of ES|QL's main advantages is its ability to work with visualization in Kibana and reduce the need to switch between different tools. You can use it to explore your data while creating or modifying visualizations seamlessly. Additionally, you can create alerts based on conditions defined with ES|QL.

Conclusion

Typically, an index-level API helps manage your index, a query to find specific data, and an aggregation to perform calculations or obtain statistics about your data. ES|QL, a piped query language, allows you to filter, transform, and analyze structured and unstructured data in Elasticsearch more intuitively than with a JSON query. Let us know if you built anything based on this blog or if you have questions on our Discuss forums and the community Slack channel.

Additional resources

If you are getting started with Elastic, these resources may be helpful.

Top comments (2)

Ivan Isaac • Jun 24 '24

This is a very thorough guide! Could you clarify the main differences between using a term query and an ES|QL query for filtering data? Would love to see more examples of aggregations in future posts.

Jessica Garson • Jun 25 '24 • Edited

Thanks, @ivis1. It depends a lot on the data you are working with, but ES|QL is typically better for working with analytics, and a term query is better suited for searching data. I like this post, which explains the differences between types of aggregations. This video on ES|QL may also interest you. For a future post, I'm considering a comparison guide between Query DSL and ES|QL.