Israel Parra

Posted on Jun 24, 2023

Elastic search

Hands-on Important concepts

In this article, I will show you some of the main concepts that we should know when we start to work with Elastic Search.

I’ll group them as questions & answers to make them more readable and easy to understand.

What is elastic search?

Elasticsearch is a distributed search engine based on Lucene that is used to index and search large amounts of data in real-time. It is designed to be highly scalable and allows for parallel data processing across multiple nodes in a cluster.

Elasticsearch also offers a wide range of additional functionalities, such as data aggregation, result segmentation, geospatial search, multi-language search, and cluster management. Additionally, it integrates with a variety of data analysis and visualization tools, making it very versatile and easy to use.

Is a powerful and scalable platform for indexing and searching large amounts of data in real time. It uses a distributed architecture and offers a wide variety of functionalities to facilitate indexing, searching, and visualizing the data

What is an index?

In Elasticsearch, an index is a structure that stores data in a format optimized for search and analysis. Indexes contain documents, which are the basic information objects in Elasticsearch. Each document has a unique identifier and contains one or more fields that contain data.

The indexing process in Elasticsearch begins when a document is sent to the search engine via a REST API (you can see the answer related to index EP to more information). When elastic search gets the request, the document is stored in one or more shards and indexed so that it can be searched.

Indexing involves analyzing the document to extract tokens (keywords) and storing them in a structure optimized for searching. The tokens are mapped to terms, which are indexed in an inverted term tree, which points to the documents that contain each term.

Indexing example:

Let’s say we want to index a collection of documents containing information about products in an online store. Each document contains information such as the product name, description, price, and category.

The first step is to create an index in Elasticsearch to store the product-related documents. The index is divided into multiple shards for greater efficiency and scalability.

Then, we send each document to Elasticsearch using the REST API. Elasticsearch going to analyze the document in order to extract tokens and stores them in its search structure.

Taking the online store example, we send a document that contains information about blue jeans, and is indexed by the following tokens: “jean”, “blue”, “man”, “casual”, “clothing”, etc. That will help us to make searches using by example: “man jean”, “blue clothing”, etc.

How to use indexes on elastic search?

Indexes are used to store and search data in Elasticsearch. Each index has settings that include text parsing, storage settings, and search settings. Specific settings can also be assigned to each field in an index, allowing for a more precise and personalized search.

Here is a simple example of how to create an index in Elasticsearch:

Log in to Elasticsearch and create an index called “my_index”:

PUT /mi_indice
Add a document to the “my_index” index with a unique identifier and some fields:

PUT /mi_indice/_doc/1
{
"name": "Juan Perez",
"age": 30,
"address": "Calle 123, Ciudad de México"
}
Find the document we just added using the Elasticsearch API:

GET /mi_indice/_search
{
"query": {
"match": {
"name": "Juan"
}
}
}

This example searches for the document that contains the name “John” in the “name” field of the “my_index” index.

In short, indices in Elasticsearch are structures that are used to store and search data. They can be customized to suit your specific application requirements and can hold millions of documents with fast and efficient search.

What are the endpoints related to indexes?

These are some of the more common index-related endpoints in Elasticsearch, but there are many others available. Each of these endpoints can be customized with additional parameters based on your specific application requirements.

Create an index:

**Endpoint signature: **PUT /.
The request body contains the data of the document that is going to be created or updated in the specified index. The format of the request body should be JSON and it should comply with the mapping structure defined for the index.

Example of request body:

{
  "title": "Example Document",
  "description": "This is an example document for Elasticsearch",
  "tags": ["example", "elasticsearch"],
  "date": "2022-03-28T10:00:00Z"
}

Delete an index:

Endpoint signature: **DELETE /.
**This endpoint doesn’t not requires of request body, you only need to provide the index to delete. This endpoint will delete the index and all the documents associated to it.

Get information about an index

Endpoint signature: **GET /.
**This endpoint doesn’t requires request object and will response with the information related to the provided index.

Get statistics about an index

Endpoint signature: **GET //_stats.
**The endpoint returns a JSON object that contains detailed statistics about the provided index, including the total number of documents, the size of the index in bytes, the number of shards and replicas, and other important details such as the number of indexing, deletion, and search operations that have been performed on the index.

Example response:

{
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "_all": {
    "primaries": {
      "docs": {
        "count": 10000,
        "deleted": 0
      },
      "store": {
        "size_in_bytes": 2000000
      },
      "indexing": {
        "index_total": 20000,
        "index_time_in_millis": 10000
      }
    },
    "total": {
      "docs": {
        "count": 20000,
        "deleted": 0
      },
      "store": {
        "size_in_bytes": 4000000
      },
      "indexing": {
        "index_total": 40000,
        "index_time_in_millis": 20000
      }
    }
  }
}

Find documents in an index

Endpoint signature **GET //_search.
**This endpoint doesn’t require a request object. It is used to search for documents in a specific index using a search query, and the query parameters are specified in the URL.

The possible query params for this endpoint are the following:

**q: **specifies the search query as a simple query string. For example, q=elasticsearch will search for documents containing the word “elasticsearch” in any field.
**size: specifies the maximum number of documents to return in the response.
**from: specifies the index of the first document to return in the response (useful for pagination).
**sort: specifies the field or fields to sort the search results by.
**_source: specifies the fields to include or exclude from the response.
**aggregations: allows aggregations to be performed on the search results.

Response example:

{
  "took": 15,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "field1": "elasticsearch",
          "field2": "kibana",
          "field3": "logstash"
        }
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.5,
        "_source": {
          "field1": "elasticsearch",
          "field2": "logstash",
          "field3": "kibana"
        }
      }
    ]
  }
}

Update a document in an index

Endpoint signature: **POST //_update/.
This endpoint is used to update a specific document in an index by providing a partial document that contains only the fields to be updated.

Parameters:

**index: The name of the index that contains the document to be updated.
**identifier: The unique identifier of the document to be updated.
**wait_for_active_shards: (Optional) The number of active shards that must be available before the update operation returns.
**routing: (Optional) The routing value that is used to route the update request to a specific shard.

This endpoint needs a request object with the document information to update, for example: To change the name property.

{
  "doc": {
    "name": "New Name"
  }
}

How many documents could be related to elastic search indexes?

The number of documents that can belong to the same index in Elasticsearch depends on several factors, such as the size of the documents, available hardware, amount of memory, and Elasticsearch configuration.

In theory, Elasticsearch can handle billions of documents in a single index. However, it’s a good idea to split your indexes into manageable sizes for ease of administration and improved performance.

Also, it’s important to note that the larger the number of documents in an index, the more resources Elasticsearch will need to index and search those documents. If the index is too large for the available hardware, there may be performance and responsiveness issues.

In general, it’s a good idea to break your indexes down to manageable sizes and tune your Elasticsearch configuration to optimize performance and scalability based on your application requirements.

What is a reindexing process and how it works?

When you perform a reindex in Elasticsearch, you are copying data from an existing index to a new index. The reindexing process is done for various reasons, such as optimizing performance, changing the index structure, and making configuration adjustments, among others.

When doing a reindex, the documents from the old index are copied to the new index. In some cases, this can take time and consume considerable resources, especially for large data sets. However, once the reindexing process is complete, you will have a new index that reflects the necessary changes.

Some of the common reasons for reindexing in Elasticsearch are:

Change index structure: Can be reindexed to change field mapping or add new fields to an existing index.
Performance optimization: Reindexing can improve search and indexing performance by removing unnecessary documents or removing outdated indexes.
Elasticsearch version upgrade: A reindex may be required when upgrading Elasticsearch to a later version.
Index consolidation — Multiple indexes can be combined into a single index, which can make administration easier and improve query performance.

By performing a reindex in Elasticsearch, you can update and optimize the data in an index to improve performance and tailor it to specific application requirements. However, it is important to carefully plan and test the reindexing process in a test environment before running it in production.

How is the reindexing process?

Reindexing in Elasticsearch is a process that involves creating a new index and copying data from an existing index into the new index with possible modifications to the index structure. This is done to reorganize the data, change the index structure, and optimize performance, among other reasons.

The reindexing process is done in several steps:

Creation of a new index: A new index is created with the desired structure.
Data source configuration: The data source is configured, which can be an existing index or a query that returns documents.
Configuration of the reindexing process: The reindexing process is configured, which includes the data source and the new index.
Execution of the reindexing process: The reindexing process starts. Elasticsearch reads the documents from the data source and writes them to the new index.
Verification of the reindexing process: It is verified that all documents have been reindexed correctly.

It is important to note that the reindexing process can be costly process in terms of resources and time, especially on large data sets. Therefore, it is recommended to carefully plan and test the process in a test environment before reindexing in production.

There is an endpoint that helps us to make that process, and I going to define and show you how it is used:

Endpoint signature: POST /_reindex

**Request body:
**To reindex an index, you’ll need to provide a request body that specifies the source and destination indices, as well as any other options you want to use. The request body can be quite complex, but here’s a simple example that reindexes all documents from an index called “my_index” to a new index called “my_index_v2”:

{
  "source": {
    "index": "my_index"
  },
  "dest": {
    "index": "my_index_v2"
  }

}

Additional params:

wait_for_completion: If set to false, the request will return immediately and the reindexing process will continue in the background. Otherwise, the request will block until the reindexing is complete.
refresh: If set to true, the destination index will be refreshed after each batch of documents is indexed. This can be useful if you want to perform searches on the new index while it’s being reindexed.
requests_per_second: The maximum number of requests per second to send to Elasticsearch while reindexing. This can be used to limit the impact of the reindexing process on your Elasticsearch cluster.

Example response:

{

  "task": {

    "node": "ABC123",

    "id": 12345,

    "type": "transport",

    "action": "indices:data/write/reindex",

    "start_time_in_millis": 1621234567890,

    "running_time_in_nanos": 1234567890,

    "cancellable": true

  }

}

When reindexing in elastic search, does the search results change with the _search endpoint?

The short answer is yes.

when performing a reindex in Elasticsearch, it is possible that the search results will change when using the _search endpoint.

This is because the reindexing process involves creating a new index and copying the documents from the original index to the new index with possible modifications in the process. Therefore, if changes have been made to the data structure, such as removing or adding new fields, or modifications have been made to existing data, the search results may change after reindexing.

Also, it is important to note that the reindexing process can take time depending on the size of the index and the configuration of Elasticsearch. During the reindexing process, some documents may not be available for search until all steps in the process are completed.

Therefore, it is recommended to perform extensive testing after reindexing to ensure that the search results are consistent and match the expectations of the application.

Could be there some issues after the reindexing process?

After reindexing in Elasticsearch, search errors may occur in the new index. Here are some suggestions found that could be helpful for troubleshooting search issues after reindexing:

Verify that documents have been indexed correctly: Some documents may not have been indexed correctly during the re-indexing process. Verify that documents have been indexed correctly and are available for search.
Verify that fields have been indexed correctly: If you added new fields during the reindexing process, make sure that they have been indexed correctly and are available for search.
Verify search query syntax: If you changed the data structure during the reindexing process, you may need to adjust the syntax of search queries to match the new data structure.
Verify analysis settings: If you added new text fields during the reindexing process, you may need to adjust the analysis settings to match the new text fields. Analysis settings affect how Elasticsearch indexes and searches text in the index.
Run search queries on multiple nodes: If you are using an Elasticsearch cluster, there may be data inconsistencies between nodes during the re-indexing process. Run search queries on multiple nodes to verify that results are consistent across the cluster.
Check Elasticsearch logs: If you still have search issues after verifying the above steps, check Elasticsearch logs for detailed information about any index or search issues. Elasticsearch logs can provide valuable information for troubleshooting search issues after reindexing.

What are aliases?

The alias, on the other hand, is a way to refer to one or more indices with a single name. It is useful in situations where you need to access multiple indices simultaneously, as it allows you to refer to them in a simpler and more organized way. Aliases can also be used to rename an existing index, reindex an index, and change the number of shards and/or replicas, without affecting the applications that use the index.

The use of indices and aliases in Elasticsearch is essential for organizing and accessing data efficiently. Indices are the basic unit of storage, while aliases allow you to refer to one or more indices with a single name and perform actions on them jointly.

What are the endpoints related to aliases?

Create alias for existing index: PUT /<index>/_aliases/<alias>

Get information about an alias: GET /_aliases/<alias> 

List Aliases: GET /_alias

Create or remove multiple aliases at the same time: POST /_aliases

Delete alias: DELETE /<index>/_alias/<alias>

Final notes:

As I mentioned many times, this article doesn’t show you all the needed to work with the elastic search but could be helpful to help you to solve some of the basic doubts that you have when learning this interesting technology.

If you want to learn more I recommend going to the official documentation or visiting this very good site Opster there you will find real-world examples.