DEV Community

Poorshad Shaddel
Poorshad Shaddel

Posted on • Originally published at levelup.gitconnected.com on

MongoDB Index Challenge: Learn by Doing!

In this article, you will learn how indexing in MongoDB works!

What you will see in this article?

  • Introduction to Indexes on MongoDB
  • A list of challenges that you should choose the right index

Introduction to Indexes on MongoDB

If you feel totally comfortable with indexes you can skip this part and get to the challenges.

Why do we need indexes?

The answer is quite simple and indexes on the database exist for the reason that you have indexes on your phone notebook(I know that probably you do not have it now! but it was a useful thing in the past). Take a look at this notebook and say if it makes sense to start checking the notebook from the beginning to the end to find last_name of someone like “John Smith”? The answer is NO and you are directly going to the “S” section and searching there and it saves a lot of time. This is the same thing on the database.


Old Phone Number Notebooks

Generally, if we do not have any index on any field, for each query we have to fetch all the documents from the Disk(which is a generally expensive operation, especially when there are a lot of documents) and then filter the desired results and return them. In MongoDB, it is called a COLLSCAN which means a collection scan.

For example, if this is our documents in a collection called users:

const users = [
  {
  "_id": "1",
  "name": "John",
  "last_name": "Smith"
  },
  {
  "_id": "2",
  "name": "Jason",
  "last_name": "Sterling"
  }
];
Enter fullscreen mode Exit fullscreen mode

and this is our query:

db.users.find({ last_name: "Smith" });
Enter fullscreen mode Exit fullscreen mode

This query has to perform a Collection Scan and fetch all the documents to check if they have this last_name or not.

On the other hand, if we create an index

db.users.createIndex({ last_name: 1 });
Enter fullscreen mode Exit fullscreen mode

Now we have an index exactly like our phone book. Now instead of fetching all the documents from the disk we are just fetching 1 document. In a MongoDB index, the Record_ID is stored which is the address of the document on the Disk.

If you do not know a lot about indexes focus on the answers and useful links which are provided!

Index Challenges

In these challenges try to avoid Collscan and also in-memory sort!

In each challenge, you get the :

  • document structure
  • query(queries) that we want to perform on this collection
  • extra information about the data if needed

You can provide one or more indexes on the collection

Challenge Number 1

// Documents
const users = [
  {
  "_id": "1",
  "name": "John",
  "last_name": "Smith"
  },
  {
  "_id": "2",
  "name": "Jason",
  "last_name": "Sterling"
  }
];

// Queries
db.users.find({ last_name: "Smith" });

db.users.find({ }).sort({ last_name: 1 });

Enter fullscreen mode Exit fullscreen mode

This is for warming you up!

Challenge 2 Answer :

As you probably know we can use the same index for searching and sorting at the same time! so as easy as that:

db.users.createIndex({ last_name: 1 });
Enter fullscreen mode Exit fullscreen mode

Challenge Number 2

// Documents
const users = [
  {
  "_id": "1",
  "name": "John",
  "last_name": "Smith"
  },
  {
  "_id": "2",
  "name": "Jason",
  "last_name": "Sterling"
  }
];

// Queries
db.users.find({ last_name: "Smith", name: "John" }); // 1
db.users.find({ last_name: "Sterling" }); // 2
db.users.find({ name: "John" }); // 3
Enter fullscreen mode Exit fullscreen mode

Challenge 2 Answer :

db.users.createIndex({ last_name: 1, name: 1 });
db.users.createIndex({ name: 1});
Enter fullscreen mode Exit fullscreen mode

When we are matching two different fields that we are searching for we can use compound indexes:

db.users.createIndex({ last_name: 1, name: 1 });
Enter fullscreen mode Exit fullscreen mode

This is like a phone book in that we have the indexes of family names but on each page, we are adding each person also sorted by their name.

Last_Name - First_Name
Anderson - Amy
Anderson - Jim
Davis - Amber
Smith - Adam 
Smith - Bruce
Smith - Jane
Enter fullscreen mode Exit fullscreen mode

Pay attention that the order in the index is important!

But if you answered with this index: db.users.createIndex({ last_name: 1 }) this is also correct but it is not the best option. With this index, we will get all the documents and then Filter the desired name.

But do we need another index for the Second Query? no, we do not need it! If we set this index it will be a redundant index.

Let’s take a look at one index and supported queries by that index to understand prefix:

db.users.createIndex({ last_name: 1, name: 1, age: 1 });

db.findOne({ last_name: "Smith" });//Not needed index: ({ last_name: 1 })
db.findOne({ last_name: "Smith", name: "John" }); // not needed ({ last_name: 1, name: 1 })
db.findOne({ last_name: "Smith", name: "John", age: 21 });
Enter fullscreen mode Exit fullscreen mode

All of the above queries are supported by the compound index.

Let’s get back to our challenge. Query 3: db.users.find({ name: "John" }) . If you think this could be supported by the existing index we had( ({ last_name: 1, name: 1})) You need to take a look at the Phone Book:

Last_Name - First_Name
Anderson - Amy
Anderson - Jim
Davis - Amber
Smith - Adam 
Smith - Bruce
Smith - Jane
Enter fullscreen mode Exit fullscreen mode

As you can see there is no order for finding first names. So we have to put a single field index: db.users.createIndex({ name: 1});

Challenge Number 3

// Documents
const products = [
  {
  "_id": "1",
  "rating": 3,
  "createdAt": ISO_DATE("2023-05-11")
  },
  {
  "_id": "2",
  "rating": 3,
  "createdAt": ISO_DATE("2023-05-11")
  }
];

// Queries
db.products.find({ }).sort({ rating: -1, createdAt: -1 }); // Best Newest
db.products.find({ }).sort({ rating: -1 }); // Best
db.products.find({ }).sort({ rating: 1, createdAt: 1 }) // Worst Oldest
Enter fullscreen mode Exit fullscreen mode

Challenge 3 Answer :

db.users.createIndex({ rating: -1, createdAt: -1 });
Enter fullscreen mode Exit fullscreen mode

This compound index will be enough since the second one is a prefix of this index. about the third query, we must know that when we have a sorted list we can traverse it “forward” or “backward” and still use the index!

Challenge Number 4

// LeaderBoards
const leaderBoards = [
  {
  "_id": "1",
  "userId": "0",
  "competition": "Python Hackaton",
  "score": 100,
  "number_of_problems_solved": 8
  },
  {
  "_id": "2",
  "userId": "1",
  "competition": "Python Hackaton",
  "score": 98,
  "number_of_problems_solved": 6
  },
  ...
];
Enter fullscreen mode Exit fullscreen mode

Here we have a leaderboard. Each document has a “competition field” which is the name of the competition and each person has a “score”. We also keep a number of problems that are solved and resulted in this score.

This is the query we want to perform.

Challenge 4 Answer :

// Queries
db.leaderBoards.find({ 
   competition: "Python Hackaton", 
   number_of_problems_solved: { $lte: 10 }  
}).sort({ score: -1 });
Enter fullscreen mode Exit fullscreen mode

In this query, we want to get userId of the people who have the best score, but we expect them to have this score for solving less than 10 problems.

Our suggestion for the index is this one:

db.leaderBoards.createIndex({ 
  competition: 1, 
  score: -1, 
  number_of_problems_solved: 1 
});
Enter fullscreen mode Exit fullscreen mode

But Why? There is a simple rule called: ESR (Equality, Sort, Range)

The first field we are indexing is competition. Since equality comparison usually can filter lots of documents it makes sense to put this as the first field in our compound index. (if for example, 95% of the documents have this field competetion with value of Python Hackaton then it might not make sense to use it as the first field — in other words, our equality filter is not Selective enough).

The second field in the compound is score. But why are we using the sort field as the second? The intention is to prevent In Memory Sort. If we use our range in the second field of the compound index then we cannot use the sort index anymore.

This is the result of the current index:


With the current index — no in-memory search

If we use this index:

{ competition: 1, number_of_problems_solved: 1, score: -1 }
Enter fullscreen mode Exit fullscreen mode

This will be the result:


without ESR Rule we face in-memory sort

As you can see we had to do a in-memory sort.

Do we have to always follow ESR Rule to prevent in-memory sorting?

The answer is NO! If the situation is somehow that the range field is really selective and eliminates most of the documents and you do not have a lot of documents left to sort, then it makes more sense to put this field as the second field of the index. In this case, we are sorting in memory but we might have better results. If you are still not sure, you can have both indexes for a period of time and see which one is chosen by the query optimizer(the query optimizer runs queries with these indexes at the same time to check which one is better — By default, it avoids the in-memory sort).

Challenge Number 5

// Documents
const products = [
  {
  "_id": "1",
  "title": "Desk",
  "weight": 1,
  "createdAt": ISO_DATE("2023-05-11")
  },
  {
  "_id": "2",
  "title": "IPhone 14",
  "createdAt": ISO_DATE("2023-05-11")
  },
  {
  "_id": "3",
  "title": "IPhone 14",
  "createdAt": ISO_DATE("2023-05-11")
  },
  {
  "_id": "4",
  "title": "IPhone 14",
  "createdAt": ISO_DATE("2023-05-11")
  },
];

// Queries
db.products.find({ weight: { $lte: 2 } }); // Filter products with less than 2 kg weight
Enter fullscreen mode Exit fullscreen mode

Extra Information : Field weight does not exist on more than half of the products(but we do not want to have these documents in the result)

Challenge 5 Answer :

// Partial Index
db.products.createIndex(
  { weight: 1 }, 
  { partialFilterExpression: { weight: { $exists: true } } }
);
// Alternative Solution
db.products.createIndex(
  { wight: 1 }, 
  { sparse: true }
);
Enter fullscreen mode Exit fullscreen mode

By using a partial index and adding a partial filter expression we can ask MongoDB to not index documents that we do not want. If you have a collection with a few million documents and you are querying on a specific field that only a minority of the documents have this field, Partial Index can be helpful with two benefits: 1- Smaller Index 2- Faster Look Up

Another alternative is using the sparse option. Only indexes if the field exists.

Another usage of Partial Indexes is in a situation where you are running a range query. For example, you never filter products that have a rating of less than 2(worst products). So why index them? we can use this index on rating:

db.products.createIndex(
  { ...ourIndex }, 
  { partialFilterExpression: { rating: { $gte: 2 } } }
);
Enter fullscreen mode Exit fullscreen mode

But keep in mind that if now you run a query like this one:

db.products.find({ rating: 1 });
Enter fullscreen mode Exit fullscreen mode

This results in a collection index!

Challenge Number 6

// Documents
const products = [
  {
  "_id": "1",
  "title": "Desk",
  "tags": [
    { name: "office" }, 
    { name: "wood" }, 
    { name: "laptop_desk"}
   ],
  "createdAt": ISO_DATE("2023-05-11")
  },
  {
  "_id": "2",
  "title": "IPhone 14",
  "tags": [
    { name: "phone" }, 
    { name: "14" }, 
    { name: "apple"}
   ],
  "createdAt": ISO_DATE("2023-05-11")
  },
];

// Queries
db.products.find({ "tags.name": "apple" }); // Filter apple products
Enter fullscreen mode Exit fullscreen mode

Challenge 6 Answer :

// Multikey Index
db.products.createIndex({ tags.name: 1 });
Enter fullscreen mode Exit fullscreen mode

This index will index each element of the array! this works on fields that are arrays like: [1, 2, 3] and also fields that are documents or nested documents .

Limitations

  • You cannot have 2 multikey indexes in a compound index(the number of indexes will be n*m)
  • Cannot use covered queries. (Covered queries are the queries that all needed fields that exist in the index you do not need to fetch documents from the disk)

There are some other limitations too that you can check here.

Challenge Number 7

// Documents
const products = [
  {
  "_id": "1",
  "title": "Desk",
  "description": "a brown desk for usage in office or home"
  "createdAt": ISO_DATE("2023-05-11")
  },
  {
  "_id": "2",
  "title": "IPhone 14",
  "description": "Hight quality product",
  "createdAt": ISO_DATE("2023-05-11")
  },
];

// Queries
db.products.find( { $text: { $search: "brown desk" } });
Enter fullscreen mode Exit fullscreen mode

Challenge 7 Answer :

// Text Index
db.products.createIndex({ description: "text" })
// Text Index with another collation to support lowercase uppercase
db.products.createIndex({ description: "text" }, { collation: {
  local: "en",
  strength: 2,
});
Enter fullscreen mode Exit fullscreen mode

The text index is also supported by MongoDB and we can use it. But keep in mind that it works similarly to the multikey index! It means that it is kind of similar to having an array of strings and putting a multikey index on this array.

And the limitation is that you can use it on only a field in your collection. But if you are using Atlas Search is a much much better option to use.

Challenge Number 8

// Documents
const users = [
  {
  "_id": "1",
  "first_name": "Poorshad",
  "last_name": "Shaddel"
  "userMetadata" : { "likes" : ["dogs", "cats"] }
  },
  {
  "_id": "2",
  "first_name": "John",
  "last_name": "Smith",
  "userMetadata" : { "dislikes" : "pickles", "inactive": false }
  },
  ...
];

// Queries
db.users.find( { userMetadata.likes: 'cats'});
db.users.find( { userMetadata.dislikes: 'pickles'});
db.users.find( { userMetadata.age: 45});
db.users.find( { userMetadata.inactive: false});
Enter fullscreen mode Exit fullscreen mode

Extra Information : userMetadata has an arbitrary structure but we want to have an index on it.

Challenge 8 Answer :

// Wild Card Index
db.users.createIndex( { "userMetadata.$**" : 1 } )
Enter fullscreen mode Exit fullscreen mode

For this case, we have to use a Wild Card Index. We can index all the fields that fit in this pattern. If you are putting a lot of stuff into this object and you are not querying most of them, it might not be the best idea to use userMetadata.

Another pattern we could use to store this arbitrary data in the index was to Attribute Pattern.

userMetadata: [
  { "attribute": "likes", value: ["cats"] },
  { "attribute": "age", value: 45 }  
]
Enter fullscreen mode Exit fullscreen mode

And we can use an index like this to support the queries on this:

db.users.createIndex({ userMetadata.attribute: 1, userMetadata.value: 1 })
Enter fullscreen mode Exit fullscreen mode

Summary Of Challenges

Challenge Number 1(Single Index)

Challenge Number 2(Compound Index)

Challenge Number 3(Sort Direction)

Challenge Number 4(ESR)

Challenge Number 5(Partial Index)

Challenge Number 6(Multikey Index)

Challenge Number 7(Text Index)

Challenge Number 8(Wildcard index)

Next Steps to Explore More

Hashed Indexes

2D Indexes


Top comments (0)