DEV Community

Shrihari Mohan
Shrihari Mohan

Posted on • Updated on

MongoDB Basic design patterns and 6 rules of thumb - in 3 mins

Thanks to William Zola for making this possible.

This article is a bit more in-depth than mine, if you have the time. I took bits and peices from that.

One to Few (< couple 100s) - Embedding

Storing Devices against a Person. You can save all those devices inside an array. Since a person can own only few number of devices.

{
   _id : ObjectID('1234'),
  name: 'MKBHD',
  devices : [
     { name: 'Samsung', model: 'S23 Ultra' },
     { name: 'Apple', model: '14 Pro Max' }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Pros

  • No need second query since all of them are in one place.

Cons

  • Cannot treat devices as stand alone entities. Like grouping devices based on name will be tough to execute.

One to many (< couple 1000s) - Referencing

A Book and its Pages. There can several hundred pages but never more than couple 1000s.

Book
{
  _id : ObjectID('1111'),
  title : 'The Lord of the Rings' , 
  author : 'J.R.R. Tolkien' ,
  pages : [
  ObjectID('2222'),
  ObjectID('3333'),
  ObjectID('4444')
  ]  
}

// Fetch the Book document by id
> book = db.Book.findOne({id: 1111});
// Fetch all the pages that are linked to this Book
> bookPages = db.Pages.find({_id: { $in : book.pages } } );
Enter fullscreen mode Exit fullscreen mode

Pros

  • Each pages can be queried efficiently.

Cons

  • Need a application level second query to get both book and pages, which with a right index doesn't look different than sql level join.

One to squillions - (indefinitely large)

Likes and comments of a Post can be arbitrarily large. You save the unique id of Post in the Likes.

Post 
{
  _id : ObjectID('1111') ,
  content : "This is a good day!",
  creatorId : "1234",
  createAt: 2023-08-19T17:07:24.426Z
}

Likes 
{
   name : "Elon",
   postId :  ObjectID('1111'), // Reference to the Post document
   createdAt: 2023-08-19T17:06:24.426Z
}
{
   name : "Musk",
   postId :  ObjectID('1111') // Reference to the Post document
   createdAt: 2023-08-19T17:07:24.426Z
}
Enter fullscreen mode Exit fullscreen mode

This also needs a appication level join and we can use mongodb operators to get count , recent 500 comments.

> post = db.Post.findOne({id: ObjectID('1111') });  //index
> last500Likes = db.Likes.find({postId: post._id})
.sort({createdAt : -1}).limit(500) // postId is indexed
Enter fullscreen mode Exit fullscreen mode

Two way referencing

A user and their posts. This way we can get user and all his posts. In some other scenario , we can get a list of posts recently posted.

User 
{
  _id : ObjectID('1111'),
  posts : [                 // Reference to Post
    ObjectID('2222'),
    ObjectID('3333'),
    ObjectID('4444')
  ]
}

Post 
{
  _id : ObjectID('2222') ,
  content : "This is a good day!"
  userId : ObjectID('1111') // Reference to User
}

{
  _id : ObjectID('3333') ,
  content : "This is a great day!"
  userId : ObjectID('1111') // Reference to User
}

{
  _id : ObjectID('4444') ,
  content : "This is a day!"
  userId : ObjectID('1111') // Reference to User
}
Enter fullscreen mode Exit fullscreen mode

Denormailization

This is used to prevent application level joins to to read frequently at the expense of costlier update query. We will use the same example of Books and Pages with denormalizations.

In the each pages with its id attached with a context. This way when getting a Book we can get a context of what each Page looks like without application Join. However if you need more information about the page thats a seperate query.

This comes with a expense of update becuase whenever context changes we have to update Page and Book.

Book
{
  _id : ObjectID('1111'),
  title : 'The Lord of the Rings' , 
  author : 'J.R.R. Tolkien' ,
  pages : [
   { id :ObjectID('2222'), 
     context: "In a hole in the ground there lived a hobbit." 
   },
   { id: ObjectID('3333'), 
     context: "It had a perfectly round door like a porthole" 
   },
   { id: ObjectID('3333'), 
     context: "There were doors on either side"
   }
  ]  
}
Enter fullscreen mode Exit fullscreen mode

6 Rules of thumb

One: Favor embedding unless there is a compelling reason not to.

Two: Needing to access an object on its own is a compelling reason not to embed it.

Three: If an array is likely to grow large, it is better to store it as a separate document or collection. This will improve performance and scalability.

Four: Don’t be afraid of application-level joins: If you index correctly and use the projection specifier, then application-level joins are barely more expensive than server-side joins in a relational database.

Five: Consider the read-to-write ratio with denormalization. A field that will mostly be read and only seldom updated is a good candidate for denormalization.

Six: How you model your data depends entirely on your particular application’s data access patterns.


Top comments (0)