DEV Community

Krishna Bose
Krishna Bose

Posted on

MongoDB lookup on an array of objects which contains (foreign) ObjectId as a key

If you have worked with MongoDB it's highly probable that you have used aggregation pipelines. And its a very common use-case that you need to connect documents from one collection to another, aka 'The Join Query' from our SQL days.

Let's elaborate the problem statement first. We have 2 collections, orders and products.
Every order has some user related information and a list of products which the user has ordered along with its quantity.

sample order object

{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: [
        {
            productId: ObjectId(...),
            quantity: 1,
        },
        {
            productId: ObjectId(...),
            quantity: 3,
        },
    ]
    ...
}
Enter fullscreen mode Exit fullscreen mode

sample product object

{
   ...
   _id: ObjectId(...),
   name: 'Macbook Pro M1',
   description: 'All systems pro.',
   price: 1200,
   ...
}
Enter fullscreen mode Exit fullscreen mode

The result we want from aggregation

{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: [
        {
            productId: ObjectId(...),
            quantity: 1,
             // this got added
            product: {
               ...
               _id: ObjectId(...),
               name: 'Macbook Pro M1',
               description: 'All systems pro.',
               price: 1200,
               ...
            }
        },
        {
            productId: ObjectId(...),
            quantity: 3,
             // this too
            product: {
               ...
               _id: ObjectId(...),
               name: 'iPhone 12',
               description: 'Blast past fast.',
               price: 900,
               ...
            }
        },
    ]
    ...
}
Enter fullscreen mode Exit fullscreen mode

Here you can see that we have inflated the product details inside the products array.
At the time of writing this article, I couldn't find any sweet and easy way of achieving this, so I decided to dive in a little deeper.

A point to note here, if the product array would have been something like products: [ObjectId(...), ObjectId(...)] then it would be a breeze. But since we also need to accommodate quantity alongside the productId, it's not quite feasible (at least with the current schema).

So, let's get started with the solution.

Step 1: Split the array

Use the $unwind operator to split the products array so that each document formed will have only one product instead of the entire array.

{
    $unwind: {
        path: '$products'
    }
}
Enter fullscreen mode Exit fullscreen mode

Now each of our unwound orders would looks something like this

{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: {
        productId: ObjectId(...),
        quantity: 1,
    }
    ...
}
Enter fullscreen mode Exit fullscreen mode

This looks simpler, but wait, where are the rest of the ordered products?
It turns out, using $unwind would split one document into multiple documents where all the data is identical except the one list on which $unwind is called.
Here, it will split our sample document into 2 documents like the following

#doc 1
{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: {
        productId: ObjectId(...),
        quantity: 1,
    }
    ...
}
#doc 2
{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: {
        productId: ObjectId(...),
        quantity: 3,
    }
    ...
}
Enter fullscreen mode Exit fullscreen mode

So, I hope it's clear that, if you have n orders with m products in each, it will generate a total m*n documents.

Step 2: Inflate product details

This is the important step, fetching product details from another collection

{
    $lookup: {
        from: 'products',
        localField: 'products.productId',
        foreignField: '_id',
        as: 'products.product'
    }
}
Enter fullscreen mode Exit fullscreen mode

this step inflates the data and makes it look something like the following

{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: {
        productId: ObjectId(...),
        quantity: 1,
        product: [
            {
               ...
               _id: ObjectId(...),
               name: 'Macbook Pro M1',
               description: 'All systems pro.',
               price: 1200,
               ...
            }
        ]
    }
    ...
}
Enter fullscreen mode Exit fullscreen mode

If you notice one thing, the product is inflated, but its an array instead of an object. The reason being, mongodb expects that multiple documents in the products table may match with this criteria, and hence by default puts all the docs in an array. In our case we know its going to be unique, so we can just unwind it out.

Step 2a: Unwind the product inflated

{
    $unwind: {
        path: '$products.product'
    }
}
Enter fullscreen mode Exit fullscreen mode

Cool! we got the data we needed. Now we can merge those docs up to make it into original.

Step 3: Merging the docs

the key for this step is $group. we will push all the products into an array.

But, here is a small catch!
For grouping them together, you need to preserve all the other fields as well. like userId, createdAt and _id.
One way is, put everything in _id of your group, something like

{
    $group: {
        _id: {
            _id: '$_id',
            userId: '$userId',
            createdAt: '$createdAt'
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

BUT...
there are 3 shortcomings

  1. Difficult to execute if docs have different keys.
  2. High chances of forgetting/mistyping any field
  3. If the schema is altered even slightly, you need to go through the pipeline and rewrite this part again.

As a typical engineer who believes in writing 20 smart lines to avoid hardcoding 5 lines, I came up with a way to make this bit future proof.

The key idea is, merge on the basis of _id alone, and then lookup itself to get a fresh copy of the doc.

Step 3a: Join on basis of _id

{
    _id: '$_id',
    products: {
        $push: '$products'
    }
}
Enter fullscreen mode Exit fullscreen mode

Here, I am pushing elements to products after picking them from products, it might sound confusing, but what it does is that, it takes all of the products from existing docs, put them in an array and then call it products in the end.

Step 3b: Fetch a fresh copy of orders for regeneration

{
    $lookup: {
        from: 'orders',
        localField: '_id',
        foreignField: '_id',
        as: 'orderDetails'
    },
    $unwind: {
        path: '$orderDetails'
    }
}
Enter fullscreen mode Exit fullscreen mode

This is pretty straightforward, shouldn't need much explaining. At this point the doc looks something like this

{
    _id: ObjectId(...),
    products: [
        {
            productId: ObjectId(...),
            quantity: 1,
            product: {
                ...
            }
        },
        ...
    ],
    orderDetails: {
        _id: ObjectId(...),
        userId: ObjectId(...),
        createdAt: 1622928483,
        products: [
            {
                // No product: {...} here, since its original doc
                productId: ObjectId(...),
                quantity: 1
            }
        ]
    }
}
Enter fullscreen mode Exit fullscreen mode

I know it looks weird, don't worry. We have some beautification left.

Step 3c: Beautify and reshape the entire doc

All we need to do is put all the stuff in their proper places.
We shall do it in 2 steps.

  1. Put the products: {} from root to orderDetails.products.
  2. Make the orderDetails as the root doc

Something like this

{
    $addFields: {
        'orderDetails.products': '$products'
    }
}
Enter fullscreen mode Exit fullscreen mode

and then

{
    $replaceRoot: {
        newRoot: '$orderDetails'
    }
}
Enter fullscreen mode Exit fullscreen mode

This one takes the contents inside orderDetails and make it as the root document.

And here we have our required order object with inflated product details!

The complete aggregation looks something like this

[
    {
        $unwind: {
            path: '$products'
        }
    },
    {
        $lookup: {
            from: 'products',
            localField: 'products.productId',
            foreignField: '_id',
            as: 'products.product'
        }
    },
    {
        $unwind: {
            path: '$products.product'
        }
    },
    {
        $group: {
            _id: '$_id',
            products: {
                $push: '$products'
            }
        }
    },
    {
        $lookup: {
            from: 'orders',
            localField: '_id',
            foreignField: '_id',
            as: 'orderDetails'
        }
    },
    {
        $unwind: {
            path: '$orderDetails'
        }
    },
    {
        $addFields: {
            'orderDetails.products': '$products'
        }
    },
    {
        $replaceRoot: {
            newRoot: '$orderDetails'
        }
    }
]
Enter fullscreen mode Exit fullscreen mode

So, here we have it!
Inflating array of items using keys nested inside objects.

If there is a better way, let me know. I'd be happy to make corrections.
Plus this is my first Dev.to post, so everyone please let me know ways to improve my quality of contribution here.

Happy coding :D

Top comments (8)

Collapse
 
mr_ziqiang profile image
闪刀浪子

Maybe you can use three stage to do this:

  1. $lookup stage: join products.productId of collection products to collection orders.

  2. $addFields stage: add and overwrite the products field。The every element of old products merge a product whitch from productDetail.

  3. $unset stage: remove the productDetail filed.

db.orders.aggregate([
  {
    $lookup: {
      from: 'products',
      localField: 'products.productId',
      foreignField: '_id',
      as: 'productDetail'
    }
  },
  {
    $addFields: {
      products: {
        $map:{
          input:"$products",
          as:'product',
          in:{
            $mergeObjects:[
              '$$product',
              {product:{$arrayElemAt:['$productDetail',{$indexOfArray:['$products','$$product']}]}}
            ]
          }
        }
      }
    }
  },
  {
    $unset:["productDetail"]
  }
])
Enter fullscreen mode Exit fullscreen mode
Collapse
 
hulastone profile image
hulastone

Your solution is 1 case maybe wrong. that is {$indexOfArray:['$products','$$product']} . indexOfArray return -1 if $$product not in $products.
then $arrayElemAt will return the last item when received the value is -1. It mean your case is wrong. These above will return the last item instead NOT FOUND as normally.

Collapse
 
edrag profile image
Edrag

Thank you for this! Have not found it anywhere else!
If it helps - the answer in this post by Gary Wild allowed me to use objectIds as reference values: stackoverflow.com/questions/410936... .
To @phatlaunchdeck , don't know if this is what you asked, but if you repeat step 2 for as many times as you have different references (I have, for example, products AND inWarehouse refernces), and then move on to step 3, this aggregation still does the trick!

Collapse
 
phatlaunchdeck profile image
Hoang-Phat Pham

Man I spent countless days searching for a solution on my own project which has the exact model schema (with slightly different fields names), thank you so much!

One thing I would like to ask if you don't mind: in my schema I actually have 2 arrays that have the same structure with your "products". If I apply the above method, only the latter array would be populated, I guess that is because of the "re-fetch root document step" would override the first array that gets populated.

Any idea on this?

Collapse
 
krishnabose02 profile image
Krishna Bose

Hey, thats a good question. At this point I cannot find any good solution for the scenario, but one thing I've learnt lately is that if your resource is read intensive, you should consider having a structure of data that allows you to fetch the documents without multiple lookups. Even if this means duplication of data, as long as you have high read to write ratio (say 100:1) or even higher, then it still results in a optimal strategy.

Collapse
 
stealthadder profile image
HΛRIHΛRΛN PΛRTHIBΛN

Should replicate the same issue on stackoverflow :D noice job!. Thanx

Collapse
 
rahiashutosh profile image
Ashutosh Rahi

Thanks for writing about this approach, this approach helped me improve API latency by up to 50%, compared to the traditional populate approach.

Collapse
 
krishnabose02 profile image
Krishna Bose

Glad that it helped