DEV Community

Cover image for What is Aggregation pipeline? how to write it? Learn with example
Shanu
Shanu

Posted on

What is Aggregation pipeline? how to write it? Learn with example

Have you ever wondered how data can be transformed and analyzed within a database? Imagine you have a large collection of data and you want to perform complex queries to get meaningful insights. This is where the Aggregation Pipeline in MongoDB comes into the picture!

What is the Aggregation Pipeline?

The Aggregation Pipeline is a framework for data aggregation in MongoDB. It consists of a series of stages where each stage transforms the documents as they pass through. Think of it as a conveyor belt in a factory, where each stage applies a specific operation to the items on the belt. By the end of the pipeline, you get the final transformed data ready for analysis.

Instagram like and Follower Count: Scenario

Absolutely! Let's use a real-world example that's easier to understand. We'll use an Instagram-like scenario to explain the Aggregation Pipeline. Imagine you want to find out how many followers a user has and how many accounts they follow, along with their total number of posts. Let's see how to do this step by step.

Why Use the Aggregation Pipeline?

The Aggregation Pipeline is incredibly powerful and flexible. It allows you to:

  • Filter data
  • Sort data
  • Group data
  • Transform data

How to Write an Aggregation Pipeline?

Setting Up the Example

Let’s say we have a MongoDB collection named users with the following documents:

json

[
  {
    "_id": 1,
    "username": "userA",
    "followers": [2, 3], // userB and userC follow userA
    "following": [4, 5], // userA follows userD and userE
    "posts": 10
  },
  {
    "_id": 2,
    "username": "userB",
    "followers": [1],
    "following": [3, 4],
    "posts": 5
  },
  {
    "_id": 3,
    "username": "userC",
    "followers": [],
    "following": [1],
    "posts": 8
  },
  {
    "_id": 4,
    "username": "userD",
    "followers": [1, 2],
    "following": [],
    "posts": 3
  },
  {
    "_id": 5,
    "username": "userE",
    "followers": [1],
    "following": [1, 4],
    "posts": 7
  }
]
Enter fullscreen mode Exit fullscreen mode

Creating the Aggregation Pipeline

Step 1: Count Followers

We want to count how many followers each user has. Here’s how you can do it:

javascript

db.users.aggregate([
  {
    $project: {
      username: 1,
      followerCount: { $size: "$followers" }
    }
  }
])
Enter fullscreen mode Exit fullscreen mode

Explanation

$project Stage: This stage helps us to create a new structure. We keep the username field and add a new field followerCount which uses $size to count the number of elements in the followers array.
Enter fullscreen mode Exit fullscreen mode

Step 2: Count Following

Next, let’s count how many accounts each user is following:

javascript

db.users.aggregate([
  {
    $project: {
      username: 1,
      followingCount: { $size: "$following" }
    }
  }
])

Enter fullscreen mode Exit fullscreen mode

Step 3: Combine the Stages

To get all the information in one go, we can combine the stages:

javascript

db.users.aggregate([
  {
    $project: {
      username: 1,
      followerCount: { $size: "$followers" },
      followingCount: { $size: "$following" },
      postCount: "$posts"
    }
  }
])

Enter fullscreen mode Exit fullscreen mode

Breakdown of the Combined Pipeline

$project Stage: We keep the username, posts (renamed to postCount), and add two new fields: followerCount and followingCount. Both new fields use $size to count the elements in the respective arrays.
Enter fullscreen mode Exit fullscreen mode

Running the Aggregation Pipeline

When you run this pipeline, MongoDB processes each document and returns the result:

json

[
  {
    "username": "userA",
    "followerCount": 2,
    "followingCount": 2,
    "postCount": 10
  },
  {
    "username": "userB",
    "followerCount": 1,
    "followingCount": 2,
    "postCount": 5
  },
  {
    "username": "userC",
    "followerCount": 0,
    "followingCount": 1,
    "postCount": 8
  },
  {
    "username": "userD",
    "followerCount": 2,
    "followingCount": 0,
    "postCount": 3
  },
  {
    "username": "userE",
    "followerCount": 1,
    "followingCount": 2,
    "postCount": 7
  }
]

Enter fullscreen mode Exit fullscreen mode

Conclusion

The Aggregation Pipeline is like having a magic wand for your data. It simplifies complex data processing and allows you to extract meaningful insights with ease. By breaking down the operations into stages, you can transform and analyze your data efficiently.

So next time you find yourself faced with a daunting data analysis task, remember the power of the Aggregation Pipeline. It’s your go-to tool for turning raw data into valuable information, making your work not only easier but also more enjoyable.

Top comments (0)