DEV Community

Cover image for Optimum Sharding strategy in OpenSearch
Sandeep Kanabar for AWS Community Builders

Posted on

Optimum Sharding strategy in OpenSearch

This article explores a few tips on optimum sharding strategy in OpenSearch.

  1. Using time-based indices wherever possible. There are a number of advantages of using time-based indices as mentioned in this article.

  2. If unsure, begin with 1 shard. With time-based indices, it offers the flexibility of modifying the number of shards anytime.
    E.g.

if the event count per second is 100 and each event is 1KB, then per day
number of events = 100 per sec * 86400 secs in day = 86,40,000
approx size of each event = 1KB
size of all events per day = 1 KB * 86,40,000 = 86,40,000 KB = ~9 GB per day
Enter fullscreen mode Exit fullscreen mode

Each shard is good enough to hold around 30-50 GB data. In the above scenario, with a daily dataset size of 9 GB, a single shard should suffice in case of day-wise indices.

  1. Consider another scenario -
If the event count per second is 200 and each event is 2KB, then per day
number of events = 200 per sec * 86400 secs in day = 1,72,80,000
approx size of each event = 2KB
size of all events per day = 2 KB * 1,72,80,000 = 3,45,60,000 KB = ~34 GB per day

Enter fullscreen mode Exit fullscreen mode

Here also, a single shard might suffice but it would impact indexing making it slower. Opting for 3 primary shards would mean each shard would be ~12 GB.

  1. For scenario discussed in point 3, the shards of size ~12 GB might look too smaller but then past indices being read-only could be force-merged to 1 segment. Alternatively, the no of shards could be reduced for past indices by re-indexing them, e.g. say reindex day-wise indices to monthly indices and then force-merge them. This could lead to 30 day-wise indices with each index have 1 shard (thereby total 30 shards for 30 indices) become a single monthly index with say 9 or 12 shards depending on the size of shards.

  2. The best way is to experiment and find out what works best. Day-wise indices offer scope to experiment as the template could be easily modified to vary the no of shards for newly created indices.

  3. Keep shards EVEN-sized even for different types of indices. Eg. say twitter index has 5 shards each of 10 GB, then design posts index such that the shard size for posts index is also approx around 10-15 GB or 10-20 GB. The reason being, if twitter index shard is 10 GB and posts index shard is say 50 GB, then it might lead to un-even disk space.

Feel free to add your questions / thoughts in the comments below.

Latest comments (0)