Thanks to William Zola for making this possible.
This article is a bit more in-depth than mine, if you have the time. I took bits and peices from that.
One to Few (< couple 100s) - Embedding
Storing Devices
against a Person
. You can save all those devices inside an array. Since a person
can own only few number of devices.
{
_id : ObjectID('1234'),
name: 'MKBHD',
devices : [
{ name: 'Samsung', model: 'S23 Ultra' },
{ name: 'Apple', model: '14 Pro Max' }
]
}
Pros
- No need second query since all of them are in one place.
Cons
- Cannot treat
devices
as stand alone entities. Like grouping devices based onname
will be tough to execute.
One to many (< couple 1000s) - Referencing
A Book
and its Pages
. There can several hundred pages but never more than couple 1000s.
Book
{
_id : ObjectID('1111'),
title : 'The Lord of the Rings' ,
author : 'J.R.R. Tolkien' ,
pages : [
ObjectID('2222'),
ObjectID('3333'),
ObjectID('4444')
]
}
// Fetch the Book document by id
> book = db.Book.findOne({id: 1111});
// Fetch all the pages that are linked to this Book
> bookPages = db.Pages.find({_id: { $in : book.pages } } );
Pros
- Each pages can be queried efficiently.
Cons
- Need a application level second query to get both book and pages, which with a right index doesn't look different than sql level join.
One to squillions - (indefinitely large)
Likes
and comments of a Post
can be arbitrarily large. You save the unique id of Post
in the Likes
.
Post
{
_id : ObjectID('1111') ,
content : "This is a good day!",
creatorId : "1234",
createAt: 2023-08-19T17:07:24.426Z
}
Likes
{
name : "Elon",
postId : ObjectID('1111'), // Reference to the Post document
createdAt: 2023-08-19T17:06:24.426Z
}
{
name : "Musk",
postId : ObjectID('1111') // Reference to the Post document
createdAt: 2023-08-19T17:07:24.426Z
}
This also needs a appication level join and we can use mongodb operators to get count , recent 500 comments.
> post = db.Post.findOne({id: ObjectID('1111') }); //index
> last500Likes = db.Likes.find({postId: post._id})
.sort({createdAt : -1}).limit(500) // postId is indexed
Two way referencing
A user and their posts. This way we can get user and all his posts. In some other scenario , we can get a list of posts recently posted.
User
{
_id : ObjectID('1111'),
posts : [ // Reference to Post
ObjectID('2222'),
ObjectID('3333'),
ObjectID('4444')
]
}
Post
{
_id : ObjectID('2222') ,
content : "This is a good day!"
userId : ObjectID('1111') // Reference to User
}
{
_id : ObjectID('3333') ,
content : "This is a great day!"
userId : ObjectID('1111') // Reference to User
}
{
_id : ObjectID('4444') ,
content : "This is a day!"
userId : ObjectID('1111') // Reference to User
}
Denormailization
This is used to prevent application level joins to to read frequently at the expense of costlier update query. We will use the same example of Books
and Pages
with denormalizations.
In the each pages with its id attached with a context
. This way when getting a Book
we can get a context of what each Page
looks like without application Join. However if you need more information about the page thats a seperate query.
This comes with a expense of update becuase whenever context changes we have to update Page
and Book
.
Book
{
_id : ObjectID('1111'),
title : 'The Lord of the Rings' ,
author : 'J.R.R. Tolkien' ,
pages : [
{ id :ObjectID('2222'),
context: "In a hole in the ground there lived a hobbit."
},
{ id: ObjectID('3333'),
context: "It had a perfectly round door like a porthole"
},
{ id: ObjectID('3333'),
context: "There were doors on either side"
}
]
}
6 Rules of thumb
One: Favor embedding unless there is a compelling reason not to.
Two: Needing to access an object on its own is a compelling reason not to embed it.
Three: If an array is likely to grow large, it is better to store it as a separate document or collection. This will improve performance and scalability.
Four: Don’t be afraid of application-level joins: If you index correctly and use the projection specifier, then application-level joins are barely more expensive than server-side joins in a relational database.
Five: Consider the read-to-write ratio with denormalization. A field that will mostly be read and only seldom updated is a good candidate for denormalization.
Six: How you model your data depends entirely on your particular application’s data access patterns.
Top comments (0)