DEV Community

Sandeep Kanabar
Sandeep Kanabar

Posted on

Sizing shards using time-based indices

This post lists a few advantages of making use of time-based indices (as well as DataStreams) in Elasticsearch.

  1. Increasing / Decreasing the number of shards becomes easy
  2. Helps to plan cluster capacity and growth size
  3. Easily determine optimum number of shards
1. Increasing / Decreasing the number of shards becomes easy

Say, an index template that makes use of day-wise indices is configured with 1 shard in index settings. In case the indexing rate is slow or the shard size becomes too large (> 40-50 GB), the index template can be easily modified to increase the number_of_ shards to 3 or 5 or n. And this gets effected from the next day. Similarly, if a day-wise index pattern is configured with more than required number of shards oversharded, reducing the number of shares becomes pretty easy as it's just a matter of changing the template which would be effected next day (unless re-indexing is done).

2. Helps to plan cluster capacity and growth size

Let's say 100 events per second flow into an Elasticsearch cluster and each event averages about 1 KB in size. Thus, per day, there would be:
86400 seconds * 100 events/second = 8,640,000 events.

Since each event averages about 1 KB, the total size of 8,640,000 events = 8,640,000 * 1 KB = 8,640,000 KB / (1024 * 1024) = ~8.24 GB.

Thus, with a day-wise index, we could see that the day-wise index size would be ~9 GB per day without any replicas. Considering 1 replica, the size per day would be ~18 GB and size for 30 days would be ~540 GB. This helps with capacity planning and estimating cluster growth rate.

3. Easily determine optimum number of shards

With data set of about 9GB per day, for a day-wise index, we could start by setting "number_of_shards" : 1 in the index template since each primary shard would be about 9 GB which is pretty reasonable for a single shard. Shards for time-based indices can be in the range of 10-50 GB as mentioned here. With a bit of trial and error based on the daily ingestion rate, we can arrive at Optimum shard size that helps in stabilizing the cluster and boosting performance.

Top comments (0)