DEV Community

Cover image for To Aggregate or Not to Aggregate, That is the Query
Karthic Subramanian for MongoDB

Posted on

To Aggregate or Not to Aggregate, That is the Query

Ah, Shakespeare would've been a fantastic MongoDB developer, don't you think?

Jokes aside, as I dive deeper and learn more about MongoDB, this is a question I ponder often: When should you go beyond the simple find command and step into the brave new world of MongoDB aggregations?

A rudimentary find might be adequate for a lot more than just simple queries, but there comes a moment in every developer's life when you're faced with complex data transformations and the find method starts looking like a Swiss Army knife at a demolition site — useful but not the most elegant solution.

So, let’s take a look at find and aggregate, their capabilities, and most importantly, when to switch from the find command to the aggregate command.

A tale of two queries

To set the stage, imagine you're building a social networking platform. Let's call it Leafyroots. You have a collection named posts that stores user posts with comments and likes.

Here's a simplified document example:

{
    "_id": "1",
    "content": "I love coding!",
    "user": "Jane",
    "likes": 200,
    "comments": [
        {"user": "Tom", "comment": "Me too!"},
        {"user": "Sara", "comment": "Showoff."}
    ]
}
Enter fullscreen mode Exit fullscreen mode

The classic find method

A social networking platform is not of much use if we cannot show all posts by a specific user, all posts from everyone you follow, and so on and so forth. While we might want to filter by a few rules, we would want the content as-is. Query the database, get all posts, and display.

To get all posts by a user named Jane, you’d use:

db.posts.find({"user": "Jane"})
Enter fullscreen mode Exit fullscreen mode

Neat and clean.

You could, of course, sort by the number of likes and only show the comments.

db.posts.find({"user":"Jane"}, {"comments" : 1}).sort({"likes":-1})
Enter fullscreen mode Exit fullscreen mode

Simple and straightforward.

For more powerful queries, you can use a range of operators and expressions for filtering, declare variables with let, and so on — find more examples.

So, find is great as you start building your application. Even to the first MVP. It is powerful yet simple, allowing you to move fast. But eventually, you will want to add more functionality to your application. Let’s suppose we open Leafyroots to content creators and one of the key metrics for content creators is the number of likes.

How would we know the average number of likes Jane gets on her posts? That’s where find starts feeling, well, limiting.

Enter the aggregate method

As we start adding personalization, intelligence, and event-driven features to Leafyroots, we don’t just want to display the data as-is. We want to derive some value from the many documents that exist in the database and deliver that value to our users.

For example, we want to show for Jane, one of our up-and-coming content creators, what the average number of likes on her post is.

Behold:

db.posts.aggregate([
    { $match: { "user": "Jane" }},
    { $group: { _id: "$user", avgLikes: { $avg: "$likes" }}}
])
Enter fullscreen mode Exit fullscreen mode

This aggregation pipeline first filters all posts by Jane and then calculates her average likes.

So, why learn aggregation?

Complexity and elegance

Let's be real: Data queries get complex. You'll find yourself needing to join collections, filter them, sort the results, and then probably transform this data into a whole new shape. While you can achieve all this with multiple find commands and some Python gymnastics, it often results in a performance hit and harder-to-maintain code. For example, you could calculate average likes by looping through the find results. But wouldn’t it be far more elegant to write a single line of code that ensures you leverage the optimizations and efficiencies in MongoDB?

Efficiency

Aggregation operations often lead to more efficient queries. MongoDB's aggregation pipeline is optimized to work with its storage engine, so you can often get the results more quickly. In addition, pushing more operations to the database frees up your application server, ensuring your users’ experience is fast and smooth. Aggregations can reduce the number of documents transferred over the network. Even if Jane were to have 10,000 posts, the final average is going to be a single, lightweight document ensuring your network bandwidth is used optimally.

Flexibility

Aggregation gives you the power to reshape your data on the fly. For example, you can even reshape a document to look like a completely different model — transforming an apple into an orange, if you will. But the best part is aggregation stages work in a modular fashion. In the above example, the output from the $match stage is the input for the $group stage. If we were to add a $project stage, its input would be the output from the $group stage. This makes it easier to test and debug with pipelines in MongoDB.

Real-world use cases

Analytics dashboard

Imagine building an analytics dashboard for Leafyroots. Aggregations can easily help summarize user behavior, the most engaging posts, or monthly active users, all in a single query.

Real-time reporting

In fintech applications where real-time reporting is a must, or in IoT use cases where monitoring over a stream of data points is necessary, aggregations can efficiently perform operations like calculating the average transaction amount for a given period.

Hierarchical data

In content platforms like Medium, where you have a hierarchy of topics, authors, and articles, aggregation can help sort, filter, and derive insights about popular topics or high-performing authors. In fact, with aggregations in MongoDB using $graphlookup you could implement a knowledge graph, as well.

More importantly, as you iterate on your application and add features — geospatial features, graph features, full-text search, vector search, time series workloads, etc. — the aggregation framework arms you with the required tools to support all of these with your existing MongoDB Atlas cluster.

The switch: when to make the move from find to aggregate

Here’s a simple rule of thumb: If you find yourself looping through find query results for additional filtering or calculations, it’s time to move to aggregation. Essentially, when your data querying logic becomes complex enough to require additional code outside MongoDB, consider the aggregate command your new best friend.

And the kicker — that original find query you had, even that can be expressed as an aggregate command.

So, to sort Jane’s post by number of likes and show only the comments:

db.posts.aggregate([
    { $match: { "user": "Jane" } },
    { $sort: { "likes": -1 } },
  { $project: { "comments": 1} },
]);
Enter fullscreen mode Exit fullscreen mode

You might never go back to find!

To aggregate or not to aggregate — remember that question?

It’s not just a question but a rite of passage. Embrace the aggregation framework, for it will give you the tools to turn even the most complex queries into elegant, optimized code. Don't just be a coder, be a craftsman — or craftswoman, or craftsperson — because knowing when and how to use the right tools is the hallmark of true craftsmanship. Happy coding! 🚀

Have you used aggregations recently? Have you been wondering what the best way to structure a query is? Participate in our community discussions to dive deeper on aggregations and queries!

Top comments (0)