DEV Community

ChunTing Wu
ChunTing Wu

Posted on

How to choose a MongoDB shard key

In this article, I will show you what is the ideal pattern of a MongoDB shard key. Although there is a good page on the MongoDB official manual, it still not provides a formula to choose a shard key.

TL;DR

The formula is

{coarselyAscending : 1, search : 1}

I will explain the reason in the following sections.

User Scenario

In order to well-describe the formula, I will use an example to illustrate the scenario. There is a collection within application logs, and the format is like:

{
    "id": "4df16cf0-2699-410f-a07e-ca0bc3d3e153",
    "type": "app",
    "level": "high",
    "ts": 1635132899,
    "msg": "Database crash"
}
Enter fullscreen mode Exit fullscreen mode

Each log has the same template, id is a UUID, ts is an epoch, and both type and level are a finite enumeration. I will leverage the terminologies in the official manual to explain some incorrect designs.

Low Cardinality Shard Key

From the mentioned example, we usually choose type at first sight. Because, we always use type to identify the logging scope. However, if we choose the type as the shard key, it must encounter a hot-spot problem. Hot-spot problem means there is a shard size much larger than others. For example, there are 3 shards corresponding to 3 types of logs, app, web and admin, the most popular user is on app. Therefore, the shard size with app log will be very large. Furthermore, due to the low-cardinality shard key, the shards cannot be rebalanced anymore.

Ascending Shard Key

Alright, if type cannot be the shard key, how about ts? We always search for the most recently logs, and ts are fully uniform distributed, it should be a proper choice. Actually, no. When the shard key is an ascending data, it works at the very first time. Nevertheless, it will result in a performance impact soon. The reason is ts is always ascending, so the data will always insert into the last shard. The last shard will be rebalanced frequently. Worst of all, the query pattern used to search from the last shard as well, i.e. the search will often be the rebalance period.

Random Shard Key

Based on the previous sections, we know type, level and ts all are not good shard key candidates. Thus, we can use id as the shard key, so that we can spread the data evenly without frequent changes. This approach will work fine when the data set is limited. After the data set becomes huge, the overhead of rebalance will be very high. Because the data is random, MongoDB has to random access the data while rebalancing. On the other hand, if the data is ascending, MongoDB can retrieve the data chunks via the sequential access.

Solution

A good MongoDB shard key should be like:

{coarselyAscending : 1, search : 1}

In order to prevent the random access, we choose the coarsely ascending data be the former. This pick also won't meet the problem of frequently rebalancing. And we put a search pattern on the latter to ensure the related data can be located at the same shard as much as possible. In our example, I will not only choose the shard key but also redesign our search pattern. The ts is fine to address the log at the specific time; however, it is a bit inefficient for a time range query like from 3 month ago til now. Hence, I will add one more key, month, in the document, so we therefore can leverage the MongoDB date type and make a proper shard key. The collection will be:

{
    "id": "4df16cf0-2699-410f-a07e-ca0bc3d3e153",
    "type": "app",
    "level": "high",
    "ts": 1635132899,
    "msg": "Database crash",
    "month": new Date(2021, 10) // only month
}
Enter fullscreen mode Exit fullscreen mode

And, the shard key is {month: 1, type: 1}.

The key point here is we use month instead of ts to avoid frequently rebalaning. The month is not made just for the shard key; on the contrary, we also use it for our search pattern. Instead of calculating the relationship between timestamp and the date, we can use getMonth to find results faster. For instance,

var d = new Date();
d.setMonth(d.getMonth() - 1); //1 month ago
db.data.find({month:{$gte: d}});
Enter fullscreen mode Exit fullscreen mode

To sum up, this article provides the concepts of designing MongoDB shard key. You might not have a coarsely ascending data so, but you can refer to the concepts and find out a proper key design for your applications.

Discussion (1)

Collapse
suhakim profile image
sadiul hakim

Good Job 👍