Aarshdeep Singh Chadha

Posted on Sep 5

How to scale Elasticsearch?

#webdev #programming #elasticsearch #dotnet

Scaling Elasticsearch is crucial for handling increasing data volumes and search workloads efficiently. Elasticsearch can be scaled in two primary ways: vertical scaling (adding more resources to existing nodes) and horizontal scaling (adding more nodes to a cluster). Below are the strategies for scaling Elasticsearch:

1. Horizontal Scaling (Cluster Expansion)

a. Add More Nodes

Elasticsearch is designed to scale horizontally by adding more nodes to a cluster. Nodes are individual instances of Elasticsearch that store data and handle search requests. Adding nodes can help distribute data and workloads, improving performance and fault tolerance.

Master Node: Manages the cluster and makes decisions like creating or deleting indices.
Data Node: Stores data and processes search requests.
Ingest Node: Handles pre-processing of documents before they are indexed.
Coordinating Node: Routes client requests to the appropriate nodes.

You can add different node types to optimize your cluster for specific workloads.

b. Sharding

Each index in Elasticsearch can be divided into smaller pieces called shards. Shards allow Elasticsearch to split the dataset across multiple nodes, ensuring that no single node becomes a bottleneck.

Primary Shards: Store actual data.
Replica Shards: Provide redundancy and high availability by storing copies of the primary shards.

By default, each index is assigned five primary shards, but you can configure this number based on your data and scaling needs.

To change the number of shards during index creation:

PUT /my-index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

c. Replication

Replication ensures fault tolerance by creating copies of shards (replica shards). When you scale horizontally, Elasticsearch can distribute replica shards to different nodes. If one node fails, another node with the replica shard can take over the workload.

To set the number of replicas:

PUT /my-index/_settings
{
  "index": {
    "number_of_replicas": 2
  }
}

2. Vertical Scaling (Improving Node Resources)

While horizontal scaling is preferred, vertical scaling can be beneficial for small deployments. Vertical scaling involves adding more CPU, memory, or disk space to existing Elasticsearch nodes.

a. Heap Size

Elasticsearch runs on the Java Virtual Machine (JVM), so managing the JVM heap size is crucial for performance. By default, Elasticsearch allocates 50% of the system's available memory to the heap, but this can be adjusted based on your workload.

You can modify the heap size in the jvm.options file:

-Xms8g
-Xmx8g

Ensure that the heap size does not exceed half of the available RAM to leave enough memory for file system caches.

b. Storage (SSD)

For better I/O performance, use SSD storage instead of HDD. Elasticsearch benefits significantly from faster disk access, especially when dealing with large datasets and heavy search workloads.

c. CPU

Elasticsearch benefits from more cores, especially for query execution and indexing. Adding more CPUs can improve query throughput and indexing speed.

3. Load Balancing and Query Routing

As you scale horizontally, you need to ensure proper load distribution. Elasticsearch automatically routes search queries to the appropriate shards and nodes, but you can use coordinating nodes or external load balancers to distribute requests evenly across the cluster.

Coordinating Nodes: Nodes that do not store data but route requests to data nodes. These help balance the workload without overwhelming the data nodes.
Load Balancers: External load balancers (e.g., NGINX, HAProxy) can distribute incoming requests across Elasticsearch nodes.

4. Index Lifecycle Management (ILM)

Managing indices effectively is crucial for scaling. Index Lifecycle Management (ILM) helps automate the lifecycle of indices, from creation to deletion. ILM policies define how indices should transition through different phases, such as:

Hot Phase: Frequent indexing and querying.
Warm Phase: Less frequent querying.
Cold Phase: Data rarely queried but still retained.
Delete Phase: Old indices are deleted to free up resources.

Here's an example of an ILM policy:

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "30d"
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

5. Monitoring and Optimization

Regular monitoring helps identify performance bottlenecks. Elasticsearch provides several built-in tools:

Kibana: Offers visualization and monitoring of cluster health, index usage, and query performance.
Elastic Stack: Use the complete Elastic Stack (ELK stack) for centralized logging, monitoring, and alerting.
Elasticsearch API: Monitor the cluster using the _cluster/health and _nodes/stats APIs to get insights into shard allocation, node health, and resource usage.

GET _cluster/health
GET _nodes/stats

6. Cross-Cluster Replication and Search

For geographically distributed applications or disaster recovery, Elasticsearch supports Cross-Cluster Replication (CCR) and Cross-Cluster Search (CCS):

CCR: Replicate indices from a primary cluster to a secondary cluster, ensuring data availability even if the primary cluster goes down.
CCS: Perform searches across multiple Elasticsearch clusters, enabling distributed data querying.

7. Use Case-Based Optimizations

Different use cases require different optimizations:

High Write Workloads: Optimize for faster writes by disabling refresh intervals or increasing bulk indexing size.
High Read Workloads: Increase the number of replicas and ensure good shard distribution for faster search performance.

Conclusion

Scaling Elasticsearch involves adding more nodes, configuring shards and replicas, and optimizing hardware resources. Horizontal scaling is generally preferred for Elasticsearch clusters, but vertical scaling can provide short-term performance boosts. Proper monitoring, index management, and load balancing are essential to ensure smooth operations as you scale.

For More Resources :

DEV Community