Shrihari Mohan

Posted on Aug 16, 2023 • Edited on Aug 20, 2023

MongoDB Basic design patterns and 6 rules of thumb - in 3 mins

Thanks to William Zola for making this possible.

This article is a bit more in-depth than mine, if you have the time. I took bits and peices from that.

One to Few (< couple 100s) - Embedding

Storing Devices against a Person. You can save all those devices inside an array. Since a person can own only few number of devices.

{
   _id : ObjectID('1234'),
  name: 'MKBHD',
  devices : [
     { name: 'Samsung', model: 'S23 Ultra' },
     { name: 'Apple', model: '14 Pro Max' }
  ]
}

Pros

No need second query since all of them are in one place.

Cons

Cannot treat devices as stand alone entities. Like grouping devices based on name will be tough to execute.

One to many (< couple 1000s) - Referencing

A Book and its Pages. There can several hundred pages but never more than couple 1000s.

Book
{
  _id : ObjectID('1111'),
  title : 'The Lord of the Rings' , 
  author : 'J.R.R. Tolkien' ,
  pages : [
  ObjectID('2222'),
  ObjectID('3333'),
  ObjectID('4444')
  ]  
}

// Fetch the Book document by id
> book = db.Book.findOne({id: 1111});
// Fetch all the pages that are linked to this Book
> bookPages = db.Pages.find({_id: { $in : book.pages } } );

Pros

Each pages can be queried efficiently.

Cons

Need a application level second query to get both book and pages, which with a right index doesn't look different than sql level join.

One to squillions - (indefinitely large)

Likes and comments of a Post can be arbitrarily large. You save the unique id of Post in the Likes.

Post 
{
  _id : ObjectID('1111') ,
  content : "This is a good day!",
  creatorId : "1234",
  createAt: 2023-08-19T17:07:24.426Z
}

Likes 
{
   name : "Elon",
   postId :  ObjectID('1111'), // Reference to the Post document
   createdAt: 2023-08-19T17:06:24.426Z
}
{
   name : "Musk",
   postId :  ObjectID('1111') // Reference to the Post document
   createdAt: 2023-08-19T17:07:24.426Z
}

This also needs a appication level join and we can use mongodb operators to get count , recent 500 comments.

> post = db.Post.findOne({id: ObjectID('1111') });  //index
> last500Likes = db.Likes.find({postId: post._id})
.sort({createdAt : -1}).limit(500) // postId is indexed

Two way referencing

A user and their posts. This way we can get user and all his posts. In some other scenario , we can get a list of posts recently posted.

User 
{
  _id : ObjectID('1111'),
  posts : [                 // Reference to Post
    ObjectID('2222'),
    ObjectID('3333'),
    ObjectID('4444')
  ]
}

Post 
{
  _id : ObjectID('2222') ,
  content : "This is a good day!"
  userId : ObjectID('1111') // Reference to User
}

{
  _id : ObjectID('3333') ,
  content : "This is a great day!"
  userId : ObjectID('1111') // Reference to User
}

{
  _id : ObjectID('4444') ,
  content : "This is a day!"
  userId : ObjectID('1111') // Reference to User
}

Denormailization

This is used to prevent application level joins to to read frequently at the expense of costlier update query. We will use the same example of Books and Pages with denormalizations.

In the each pages with its id attached with a context. This way when getting a Book we can get a context of what each Page looks like without application Join. However if you need more information about the page thats a seperate query.

This comes with a expense of update becuase whenever context changes we have to update Page and Book.

Book
{
  _id : ObjectID('1111'),
  title : 'The Lord of the Rings' , 
  author : 'J.R.R. Tolkien' ,
  pages : [
   { id :ObjectID('2222'), 
     context: "In a hole in the ground there lived a hobbit." 
   },
   { id: ObjectID('3333'), 
     context: "It had a perfectly round door like a porthole" 
   },
   { id: ObjectID('3333'), 
     context: "There were doors on either side"
   }
  ]  
}

6 Rules of thumb

One: Favor embedding unless there is a compelling reason not to.

Two: Needing to access an object on its own is a compelling reason not to embed it.

Three: If an array is likely to grow large, it is better to store it as a separate document or collection. This will improve performance and scalability.

Four: Don’t be afraid of application-level joins: If you index correctly and use the projection specifier, then application-level joins are barely more expensive than server-side joins in a relational database.

Five: Consider the read-to-write ratio with denormalization. A field that will mostly be read and only seldom updated is a good candidate for denormalization.

Six: How you model your data depends entirely on your particular application’s data access patterns.

DEV Community

MongoDB Basic design patterns and 6 rules of thumb - in 3 mins

One to Few (< couple 100s) - Embedding

Pros

Cons

One to many (< couple 1000s) - Referencing

Pros

Cons

One to squillions - (indefinitely large)

Two way referencing

Denormailization

6 Rules of thumb

Top comments (0)

Read next

Demystifying Algorithms: Modular Indexing

Time is a tool, not a toy

Rollback Kubernetes Deployment: Rolling Updates and Rollout Undos

Fine-Tuning LLMs: Comparison of Collab, Kaggle and 6 other Platforms