DEV Community

Cover image for How to Create Relationships with Mongoose and Node.JS
oluseyeo
oluseyeo

Posted on • Edited on

How to Create Relationships with Mongoose and Node.JS

FOCUS: One-to-many Relationships


NoSQL databases, unlike SQL databases like PostgreSQL, MYSQL etc, which are traditionally built for data relationship management, indexed and referenced across multiple tables, have a poor or almost non-existent support for relationships in her JSON-like built schema. MongoDB, a popular NoSQL database, like others, have inbuilt methods that developers can leverage to build relationships between multiple schemas.

SQL vs NoSQL

Relationships in MongoDB are built on the JOIN functionality and with the popular NPM module, the Mongoose library, developers can harness its raw power, building complex relationships, and importantly, designing efficient databases to avoid throttling queries, as it would have been done, if working with an SQL database.

Mongoose, MongoDB, NodeJS relationship chart

In this tutorial, I am going to be touching on the following in details:

  • Types of relationships & object reference types in MongoDB
  • Mongoose Populate Method
  • Mongoose Virtuals

Prerequisites:

It is expected that readers have a good basic grasp of ExpressJS, Mongoose, ES6+ JS & Postman.

Also, the following should be available either as a service or installed and running locally on your PC:

  • MongoDB or you can choose Atlas, the cloud version of MongoDB.
  • Mongoose NPM. Simply run [npm i mongoose ] at the root of your project folder.
  • Postman, to test the endpoints.


"npm i mongoose"


Enter fullscreen mode Exit fullscreen mode

For the purpose of this write-up, I have built a small “Publishing House” project, to walk you through how to achieve any of the methods to be discussed. The Publishing House project assumes Publishers as registered users, who can publish multiple books under their portfolio.

  • MongoDB as database.
  • Mongoose library, as the database object document manager (ODM).
  • ExpressJS to create our routes using async/await ES6+ since we shall be dealing with promises.
  • Postman will be used to test our endpoints for responses.

Mongoose represents relational data using two major design models, and the choice of model to deploy when planning the database collections of any project is predominantly hinged on the data-size, data accuracy, and frequency of access. Nonetheless, the rule of thumb is, the size of documents stored, is in direct proportion to the speed at which queries are resolved, and ultimately, how performant the database is.

The two models are as follows:

  1. Embedded Data Models [Denormalization]: This is the least recommended form of relationship. Data is simply denormalized by embedding Child (related) documents right into the Parent (main) document. Using our “Publishing project” as an example, this would mean, Publishers, store all published books and related information directly on each publisher’s object.
    In a typical One-to-Few document relationship, this would work perfectly as the expected size of documents is not more than 20. However, when working with Child documents of a larger size, this size heavily impairs database performance, causing lags, and difficulty in keeping data synced, ultimately bringing about poor user experience.

  2. Referenced Data Model [Normalization]: When data is normalized, it means documents are separated into different collections, and they share references between each other. In most cases, a single update on the Parent document, with all parameters passed, updates the child documents directly referenced to it. The rest of this tutorial will be focused on the best use case of this method, and how best to organize our database collections and documents in an efficient manner.

Referencing documents between collections can be done via dual approaches, and are as follows:

  • Child Referencing: A document is considered Child referenced, when the Parent document stores a reference to its child collections, storing its identifiers - in most situations, the id, in an array of similar identifiers on the Parent document. Citing our “Publishing House” project, this would mean, having Publishers store the book._id for each book created, in an array of book id’s, predefined on the Publisher's Schema, and when needed, fetch these child documents using the populate method.

From our Project, see the Publisher's schema below:



const mongoose = require('mongoose');
const {Schema} = require('mongoose');

const publisherSchema = new Schema({
   name: String,
   location: String,
   publishedBooks: [{
      type: Schema.Types.ObjectId,
      ref: 'Book'
   }]
},
{timestamps: true});

module.exports = mongoose.model('Publisher', publisherSchema);


Enter fullscreen mode Exit fullscreen mode
Publisher Schema [Notice published books is an array]

Here is our Book Schema:



const mongoose= require('mongoose');
const {Schema} = require('mongoose');

const bookSchema = new Schema({
   name: String,
   publishYear: Number,
   author: String,
   publisher: {
      type: Schema.Types.ObjectId,
      ref: 'Publisher',
      required: true
   }
},
{timestamps: true});

module.exports = mongoose.model('Book', bookSchema);


Enter fullscreen mode Exit fullscreen mode

Book Schema

The mongoose “populate” method loads the details of each referenced Child documents and returns it alongside each Publisher's document fetched from the DB. Let’s see an example of this using our project.

We start by creating a new Publisher below:



/***
 * @action ADD A NEW PUBLISHER
 * @route http://localhost:3000/addPublisher
 * @method POST
*/
app.post('/addPublisher', async (req, res) => {
   try {
      //validate req.body data before saving
      const publisher = new Publisher(req.body);
      await publisher.save();
      res.status(201).json({success:true, data: publisher });

   } catch (err) {
      res.status(400).json({success: false, message:err.message});
   }
});


Enter fullscreen mode Exit fullscreen mode
Create a new publisher


{
    "success": true,
    "data": {
        "publishedBooks": [],
        "_id": "5f5f8ac71edcc2122cb341c7",
        "name": "Embedded Publishers",
        "location": "Lagos, Nigeria",
        "createdAt": "2020-09-14T15:22:47.183Z",
        "updatedAt": "2020-09-14T15:22:47.183Z",
        "__v": 0
    }
}


Enter fullscreen mode Exit fullscreen mode
A new publisher

Next, the newly created Publisher proceeds to add a new book about to publish to it's DB. The publisher’s _id is passed in as a value to the Publisher’s key on the Book schema before saving, and in the same request loop, right after calling the save method on the new book, the newly created book object returned from the Promise, MUST be passed as a parameter to a push method, called on the Publisher’s key. This would ensure that the book object, is saved on the Publisher's document.

Here's the magic breakdown:



/***
 * @action ADD A NEW BOOK
 * @route http://localhost:3000/addBook
 * @method POST
*/

app.post('/addBook', async (req, res)=>{

   /**
    * @tutorial: steps
    * 1. Authenticate publisher and get user _id.
    * 2. Assign user id from signed in publisher to publisher key.
    * 3. Call save method on Book.
   */

   try {
      //validate data as required

      const book = new Book(req.body);
      // book.publisher = publisher._id; <=== Assign user id from signed in publisher to publisher key
      await book.save();

      /**
       * @tutorial: steps
       * 1. Find the publishing house by Publisher ID.
       * 2. Call Push method on publishedBook key of Publisher.
       * 3. Pass newly created book as value.
       * 4. Call save method.
      */
      const publisher = await Publisher.findById({_id: book.publisher})
      publisher.publishedBooks.push(book);
      await publisher.save();

      //return new book object, after saving it to Publisher
      res.status(200).json({success:true, data: book })

   } catch (err) {
      res.status(400).json({success: false, message:err.message})
   }
})


Enter fullscreen mode Exit fullscreen mode
A Publisher adding a new book to be published to her DB

This is the defined way to saving child document references(id’s) on the publisher’s document. On successful creation, the below is returned when you query the Publisher's id.

PS: The Publisher below created 3 new books.



{
    "publishedBooks": [
        {
            "_id": "5f5f8ced4021061030b0ab68",
            "name": "Learn to Populate virtuals Mongoose",
            "publishYear": 2019,
            "author": "Devangelist"
        },
        {
            "_id": "5f5f8d144021061030b0ab6a",
            "name": "Why GoLang gaining traction",
            "publishYear": 2020,
            "author": "John Doe"
        },
        {
            "_id": "5f5f8d3c4021061030b0ab6b",
            "name": "Developer Impostor syndrome",
            "publishYear": 2021,
            "author": "John Mark"
        }
    ],
    "_id": "5f5f8ac71edcc2122cb341c7",
    "name": "Embedded Publishers",
    "location": "Lagos, Nigeria",
    "createdAt": "2020-09-14T15:22:47.183Z",
    "updatedAt": "2020-09-14T15:33:16.449Z",
    "__v": 3
}


Enter fullscreen mode Exit fullscreen mode
Saved object returns Child array

However, Should the push and save method not be called on the Publisher's document, the Publisher although existing, and the new Book created, will return an empty array of publishedBooks as seen below, when queried.



{
    "success": true,
    "data": {
        "publishedBooks": [],
        "_id": "5f5f8ac71edcc2122cb341c7",
        "name": "Embedded Publishers",
        "location": "Lagos, Nigeria",
        "createdAt": "2020-09-14T15:22:47.183Z",
        "updatedAt": "2020-09-14T15:22:47.183Z",
        "__v": 0
    }
}


Enter fullscreen mode Exit fullscreen mode
Empty Array, when object isn't pushed and saved

Despite the success of the Child Referencing method, its limitation as seen above is that the size of the array of Id’s can get very large quickly, consequently seeing the database lose efficiency and performance overtime as the size of the array grows. MongoDB officially recognizes this as an anti-pattern, and strongly discourages its use for document relationships run at scale.


  • Parent Referencing: Parent referencing, on the other hand, is a tad different from Child Referencing as described earlier, in that, ONLY Child documents keep a reference to parent documents. This reference is singly kept on each Child document created, defined as an object ID on the Schema. Parent documents, conversely, keep no direct reference but builds one with the help of a Mongoose method called Virtuals.

Mongoose Virtual is a far more sophisticated approach to fetching referenced Child documents, and it importantly, takes up less memory for data storage, as the new key-field Mongoose virtual creates whenever a query is run, doesn’t persist on the Parent document. Occasionally, Virtuals are also referred to as "reverse-populate', as such, when you hear people mention that, don't fret!

Enough with the talk, let's jump into our project code.
First, let's see what our Book Schema looks like below:



const mongoose= require('mongoose');
const {Schema} = require('mongoose');

const bookSchema = new Schema({
   name: String,
   publishYear: Number,
   author: String,
   publisher: {
      type: Schema.Types.ObjectId,
      ref: 'Publisher',
      required: true
   }
},
{timestamps: true})

module.exports = mongoose.model('Book', bookSchema);


Enter fullscreen mode Exit fullscreen mode

Next, which is where the tricky part lies, is our Parent document. Please pay attention to how virtuals are defined and a crucial part of this is the extra options we must set on the Schema, without which no results get returned. These extra options are the toJSON and toObject options. They both default to false, and are core to ensuring that whenever the Parent document is queried when these options are set to True, results are passed to the .json() method on the response call.



const mongoose = require('mongoose');
const {Schema} = require('mongoose');

const publisherSchema = new Schema({
   name: String,
   location: String
},
   {timestamps: true}
);

/**
 * @action Defined Schema Virtual
 * @keys 
 *    1.   The first parameter can be named anything.
 *          It defines the name of the key to be named on the Schema
 * 
 *    2. Options Object
 *       ref: Model name for Child collection
 *       localField: Key for reference id, stored on Child Doc, as named on Parent Doc.
 *       foreignField: Key name that holds localField value on Child Document
 */
publisherSchema.virtual('booksPublished', {
   ref: 'Book', //The Model to use
   localField: '_id', //Find in Model, where localField 
   foreignField: 'publisher', // is equal to foreignField
});

// Set Object and Json property to true. Default is set to false
publisherSchema.set('toObject', { virtuals: true });
publisherSchema.set('toJSON', { virtuals: true });


module.exports = mongoose.model('Publisher', publisherSchema);


Enter fullscreen mode Exit fullscreen mode
Notice that we don’t have a publishedBooks array anymore on the Schema

Defining the virtual object comes next, and the best way to easily remember how to define it, (much easier if you’re from an SQL background), is;

SELECT “name for the virtual field” FROM “ref – Child collection name”, WHERE “localField – Parent key stored on child collection, mostly id” EQUALS “_foreignField – the name of Child schema key, storing parent id, as its value.


With both options above defined, whenever we populate our Publisher after calling the GET method, we are guaranteed to retrieve all books published by each publisher, and for further specificity, as not all the information about a book will be needed, select the keys required from each book and return it in the response body.

See how it is done in our project below:



/***
 * @action GET ALL PUBLISHERS
 * @route http://localhost:3000/publishers
 * @method GET
 */
app.get('/publishers', async (req, res) => {
   try {
      const data = await Publisher.find()
                                 .populate({path: 'booksPublished', select: 'name publishYear author'});
      res.status(200).json({success: true, data});
   } catch (err) {
      res.status(400).json({success: false, message:err.message});
   }
})


Enter fullscreen mode Exit fullscreen mode
Get all Publishers


{
    "success": true,
    "data": [
        {
            "_id": "5f5f546e190dff51041db304",
            "name": "Random Publishers",
            "location": "Kigali, Rwanda",
            "createdAt": "2020-09-14T11:30:54.768Z",
            "updatedAt": "2020-09-14T11:30:54.768Z",
            "__v": 0,
            "booksPublished": [
                {
                    "_id": "5f5f548e190dff51041db305",
                    "name": "Mastering Mongoose with Javascript",
                    "publishYear": 2020,
                    "author": "Devangelist",
                    "publisher": "5f5f546e190dff51041db304"
                },
                {
                    "_id": "5f5f55ca190dff51041db307",
                    "name": "Learning Mongoose Populate method",
                    "publishYear": 2019,
                    "author": "Devangelist",
                    "publisher": "5f5f546e190dff51041db304"
                }
            ],
            "id": "5f5f546e190dff51041db304"
        }
}


Enter fullscreen mode Exit fullscreen mode
Query results from getting all publishers [Notice the booksPublished array]

Summarily, Parent referencing is the best approach to referencing when using the Normalized model method and dealing with a large dataset.

If you made it to this point, thank you for reading through, and I hope you’ve learnt something-[new]. I’m happy to chat further about new knowledge, opportunities and possible corrections. I can be reached on twitter via, @oluseyeo_ or via email at, sodevangelist@gmail.com.

Happy Hacking 💥 💥


TL: DR;

  1. There are two modelling approaches, Embedded and Referenced.
  2. Embed only when your data will be accessed less frequently and you’re mostly only reading data.
  3. For larger IOPS, use referencing model.
  4. Referencing can be done in two ways, Child and Parent referencing.
  5. If Child document size is small, under 100, use Child referencing. This stores child reference key directly on Parent document using the push method.
  6. If the size of Child documents is huge, use the parent referencing option, reverse populating Parent documents using mongoose virtual.

Recommended further reading:
Data Access Patterns
Mongoose Documentation
Denormalization

Top comments (15)

Collapse
 
idionomfon profile image
Idiono-mfon Anthony

Oh, this is a wonderful explanation. Thank you for spending time to make justice to this subject. I was researching ways of handling data modeling effectively in mongoDb and behold I landed In this, and it has cleared the air.

Collapse
 
adsadsasdasdasd profile image
Diego

Hi, How can I remove the id and _id repeated from the last image?

Collapse
 
giancode1 profile image
Giancarlo C

link

new Schema({ ... }, { id: false });

Collapse
 
sostenemunezero profile image
Sostene MUNEZERO BAGIRA

is it possible to use mongoose schema post hooks to add relation?

Collapse
 
oluseyeo profile image
oluseyeo

What kind of relationship are you looking to add?

Collapse
 
hemantparmar profile image
hemant-parmar

Great article. I am about to implement relations for my MEAN project.

A Couple of questions.
Q-1. Can I have multiple path in find().populate() method?
I have a Log collection, I was thinking to add 4 ObjectId type fields. i.e. client, service, executive, manager - of course apart from its own fields. Each of these are just one object. i.e. One Log entry will be associated with one Client, Service, Execute and Manager.

So I will need to add multiple path with their own select keys. Is that recommended?

And when on frontend, when the Log is displayed, I am planning to populate relevant fields from all these 4 (i.e. Client Name, ClientCategory, ClientSubCategory, ClientRating), (ServiceName, ServiceFreq), ExecName and ManagerName.

The Log display on frontend has Search and filter options on various fields. So when a user searches or applies filters the backend Mongoose query will run again, fetch the data and display.

Q-2: What would be the performance impact if the number of entries in Log collection is in the range to 5000 - 50,000, when I use Child Ref vs Parent Ref? Which one is recommended according to you in this case.

Thanks again
hemant

Collapse
 
drsimplegraffiti profile image
Abayomi Ogunnusi

Great post ....thanks for this.
Question: Please how can i fetch data from db using timestamps. Thanks

Collapse
 
amincode24 profile image
Amin

This was a good tutorial.Goodjob!👍

Collapse
 
smahrt profile image
Kubiat Morgan

Nice write-up! You have done justice to the relationship options available to document-oriented DBs.
Easy to follow too.

Collapse
 
oluseyeo profile image
oluseyeo

Thank you, Kubiat.

Collapse
 
cca2013 profile image
Konstantinos Anastasiadis

Yes this is something clear , at last. It is the backend.
I would like to have the code and adjust it , to my front-end.
Thanks.

Collapse
 
faouzimohamed profile image
Faouzi Mohamed

Thank you for this wonderfull explanation. Now i can implement my own relations for my express Web App.

Collapse
 
hariszulfiqar054 profile image
hariszulfiqar054

Dude it's up to the mark. I just fall in love with your clear explanation. :)

Collapse
 
oluseyeo profile image
oluseyeo

Thank you.