This article explores a few tips on optimum
sharding strategy in OpenSearch.
Using time-based indices wherever possible. There are a number of advantages of using time-based indices as mentioned in this article.
If unsure, begin with
1shard. With time-based indices, it offers the flexibility of modifying the number of shards anytime.
if the event count per second is 100 and each event is 1KB, then per day number of events = 100 per sec * 86400 secs in day = 86,40,000 approx size of each event = 1KB size of all events per day = 1 KB * 86,40,000 = 86,40,000 KB = ~9 GB per day
Each shard is good enough to hold around
30-50 GB data. In the above scenario, with a daily dataset size of
9 GB, a
single shard should suffice in case of day-wise indices.
- Consider another scenario -
If the event count per second is 200 and each event is 2KB, then per day number of events = 200 per sec * 86400 secs in day = 1,72,80,000 approx size of each event = 2KB size of all events per day = 2 KB * 1,72,80,000 = 3,45,60,000 KB = ~34 GB per day
Here also, a single shard might suffice but it would impact indexing making it slower. Opting for 3 primary shards would mean each shard would be ~12 GB.
For scenario discussed in point 3, the shards of size ~12 GB might look too smaller but then past indices being read-only could be force-merged to 1 segment. Alternatively, the no of shards could be reduced for past indices by re-indexing them, e.g. say reindex day-wise indices to monthly indices and then force-merge them. This could lead to 30 day-wise indices with each index have 1 shard (thereby total 30 shards for 30 indices) become a single monthly index with say 9 or 12 shards depending on the size of shards.
The best way is to experiment and find out what works best. Day-wise indices offer scope to experiment as the template could be easily modified to vary the no of shards for newly created indices.
Keep shards EVEN-sized even for different types of indices. Eg. say
5 shardseach of
10 GB, then design
postsindex such that the shard size for posts index is also approx around
10-20 GB. The reason being, if
postsindex shard is say
50 GB, then it might lead to un-even disk space.
Feel free to add your questions / thoughts in the comments below.
Top comments (0)