DEV Community

Cover image for Aggregation Pipeline in MongoDB and the use of $match and $group operator (Part 2)
Ganesh Yadav
Ganesh Yadav

Posted on

Aggregation Pipeline in MongoDB and the use of $match and $group operator (Part 2)

Hello and welcome back readers, sorry for the delay in this second part of the MongoDB Aggregation Pipeline series, where we are going to explore the power of the Aggregation Pipeline provided by MongoDB to make a developer's life easy.

If you are new to this article, I would like you to check the first part of the series by clicking here, and for the readers who are reading it on a continuous track let's first revise what we have learned till now.

Till now we have a clear understanding of what is aggregation in MongoDB and what are the different types of aggregation provided by MongoDB such as

  1. Map Reduce Function

  2. Single Purpose Aggregation

  3. Aggregation Pipeline

In this article, we will deeply explore the power of the aggregation pipeline provided with the use of $match and $group operators.

Aggregation Pipeline using $match operator:

$match operator filters the documents to pass only the documents that match the specified conditions to the next pipeline stage.

In General, if you have applied a $match operator to a set of data with mentioned expressions/fields then it will filter out only the documents that will match the field on each document and return the documents.

//Syntax

Syntax:  { $match: { <query> } }

Enter fullscreen mode Exit fullscreen mode

let us understand the $match operator with the basic example where we have the collection of student data

//Collection Students
{ "_id": ObjectId("512bc95fe835e68f199c8686"), "student": "Dave Smith", "score": 80}
{ "_id": ObjectId("512bc962e835e68f199c8687"), "student": "Dave Smith", "score": 85}
{ "_id": ObjectId("55f5a192d4bede9ac365b257"), "student": "Ahn ben", "score": 60}
{ "_id": ObjectId("55f5a192d4bede9ac365b258"), "student": "li xin", "score": 55}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b259"), "student": "Ben hue", "score": 60}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25a"), "student": "li xin", "score": 94}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25b"), "student": "Peter parker", "score": 95}

//Now if we apply the $match operator to the above collection with the student's name 
//Dave Smith We get only the documents in which the student is Dave Smith

db.students.aggregate([{$match:{student:"Dave smith"}}])
//we will get the results as shown below
{ "_id": ObjectId("512bc95fe835e68f199c8686"), "student": "Dave Smith", "score": 80}
{ "_id": ObjectId("512bc962e835e68f199c8687"), "student": "Dave Smith", "score": 85}

Enter fullscreen mode Exit fullscreen mode

Aggregation Pipeline using $group operator:

Now we can move to the next operator which is the $group operator, as we know in the aggregation pipeline there is a series of stages which we can introduce, to extract a certain kind of data as per our requirements.

The $group stage separates documents into groups according to a "group key". The output is one document for each unique group key.

syntax:     { $group: { _id: , // Group key : 
              <field1>: {<accumulator> :<expression> }, 
                  ... 
                      } 
             }

Enter fullscreen mode Exit fullscreen mode

The above definition of group operator generally refers to, when it is applied on a set of documents it will return a set of documents with each document containing the field _id as the first field followed by the second field in which the group is done.

for example, as shown below the group operator is applied in the student collection.


//Our student's collection with the name of the student as field "student"
{ "_id": ObjectId("512bc95fe835e68f199c8686"), "student": "Dave Smith", "score": 80}
{ "_id": ObjectId("512bc962e835e68f199c8687"), "student": "Dave Smith", "score": 85}
{ "_id": ObjectId("55f5a192d4bede9ac365b257"), "student": "Ahn ben", "score": 60}
{ "_id": ObjectId("55f5a192d4bede9ac365b258"), "student": "li xin", "score": 55}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b259"), "student": "Ben hue", "score": 60}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25a"), "student": "li xin", "score": 94}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25b"), "student": "Peter parker", "score": 95}

//Now if we group the collections based on the student as _id 
db.students.aggregate([{$group:{_id:"$student"}])

//_id is mandatory field for applying $group which always takes the filed in
//which you want your collections to be grouped
//will always return the total distinct names present inside your collections
{"_id": "Dave Smith"}
{"_id": "Ahn ben"}
{"_id": "li xin"}
{"_id": "Peter Parker"}


Enter fullscreen mode Exit fullscreen mode

Now from the above two operators, it is clear that $match can be used when you want to filter out the document based on a certain field and $group can used to group the particular collection based on the "Group key".

Similarly, as we know there can be multiple stages in an aggregation pipeline and we can introduce any number of stages as much as we want, Now we will try to add these two stages in our next example, which utilises both $match and $group operator.

Problem Statement for Two-stage Pipeline:

We want to find the name of the Student who has scored greater than or equal to 80.

From the above problem statement, it is clear that we can use the $match operator to find the student with a score greater than or equal to 80 because in our collection we have the document with a duplicate student name as the student can be present twice, so we have to also use $group operator to also find distinct value.


//Our student's collection with the name of the student as field "student"
{ "_id": ObjectId("512bc95fe835e68f199c8686"), "student": "Dave Smith", "score": 80}
{ "_id": ObjectId("512bc962e835e68f199c8687"), "student": "Dave Smith", "score": 85}
{ "_id": ObjectId("55f5a192d4bede9ac365b257"), "student": "Ahn ben", "score": 60}
{ "_id": ObjectId("55f5a192d4bede9ac365b258"), "student": "li xin", "score": 55}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b259"), "student": "Ben hue", "score": 60}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25a"), "student": "li xin", "score": 94}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25b"), "student": "Peter Parker", "score": 95}

//we will be using two stages here for extracting the data
db.students.aggregate([ 
                     {  //first stage will find the document with the score
                        //greater than or equal to 80
                       $match:{"score":{ $gte:80 }}
                     },
                     {  //second Stage will group that distinct name from each document got from the first stage
                       $group:{ _id: "$student"}
                     }
                     ])

//The result we get from this two-stage pipeline is 
//In the first stage we get the four documents with similar names  Dave Smith two times
//from the second stage we will group those documents based on distinct student name
{"_id": "Dave Smith"}
{"_id": "li xin"}
{"_id": "Peter Parker"}


Enter fullscreen mode Exit fullscreen mode

Conclusion:

In this Article, we now get a basic understanding of what is the $match and $group operators, why it is used and how they minimize the filter techniques which involve a tedious process at the front end if it is not filtered from the backend.

But MongoDB's aggregation pipeline can do this thing in merely two lines of command to extract the particular required data, and that is what we call the essence of MongoDB Aggregation Power which can solve a bigger problem.

I hope you like this article a lot and would appreciate my work, I have learned this thing from the internet as well do check the below link, and also stay tuned for further part of this series

Do Like, Comment, Share and Subscribe to my Newsletter for getting my work directly in your inbox.

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more