DEV Community

Cover image for NoSQL Database Design for E-Commerce Apps in 2021

NoSQL Database Design for E-Commerce Apps in 2021

Daniel Kolb
I write about e-commerce related code & design topics. Feel free to add me for a chat on LinkedIn!
Originally published at Updated on ・6 min read

You're building things for the e-commerce industry? Then you should join my newsletter:

Working with a lot of data like products, orders, categories, users and payments is a very important topic when building e-commerce applications. In this post you'll learn the very basics of structuring your noSQL schema so it's fast and scalable for e-commerce scenarios.

Why you should consider using a noSQL database in 2021

NoSQL databases like MongoDB are still pretty popular amongst modern app development projects and evolved a lot in the last time. The community released many decent packages that help you working with noSQL databases on a very scalable level - like schema generators and battle-proofed packages for combining it with JS frontends or graphQL APIs.
Besides that, especially MongoDB did an awesome job with providing a very good cloud solution (MongoDB Atlas: that perfectly fits in a modern web tech stack.

Are noSQL databases the right choice for an e-commerce application?

In case you've dropped the idea of using noSQL databases in the last years because you've heard of them not being the right choice for any apps that need to manage a lot of complex & transactional data, you should give them a shot again. Things changed quite a lot. Actually, Reaction Commerce wrote a decent post about that more than 3 years ago and the technology improved even more since then:

To summarise: With MongoDB (and any other noSQL database) you can build safe, reliable, scalable, cloud-based databases that are very fun and easy to code with.

What you should consider when you create schemas for e-commerce apps

When you build e-commerce experiences you typically have to cater for:

  • up to thousands of products
  • products that have CMS-like content like images and descriptions
  • products that are available in up to thousands of variants...
  • variants that are defined by many different attributes
  • probably thousands of orders...
  • orders might have multiple products (and their variants)
  • orders might have different states that might change multiple times a day
  • orders might be assigned to customers
  • customers might be users with a login name, password, address, etc.
  • there's usually also quite a lot of CMS content like blog posts, marketing landing pages, etc.
  • and much more

As you can see, you should make sure that your database works for a huge amount of data sets that are connected to each other.

Embedding Data: Work with data fast & easy

When you work with MongoDB, you create JSON-based entries (called "documents") in your database and group them in so-called "collections".

For example, the collection "products" contains multiple documents while each document contains the data for a product like this one:

// a document in the "products" collection
    "_id" : ObjectId("5e451f4cd249baf1d045a778"),
    "title" : "Hackathon T-Shirt",
    "price" : 3.99,
    "currency" : "USD",
    "description" : "My Awesome Shirt",
    "sku": "DEV1337",
    "createdAt" : ISODate("2021-01-02T10:05:00.610Z"),
    "stock" : 12,
    "sizes": ["xs","s","m","l","xl","xxl"],
    "colors": ["red","green","blue"],
    "vendor" : "DEVSHIRTS",
    "vendorSlug": "devshirts",
    "vendorDescription": "DEVSHIRTS is a fashion label that sells T-Shirts for devs."
Enter fullscreen mode Exit fullscreen mode

Query embedded data

In this example I've embedded all data for this product in its document. By embedding the data directly to the object I can get the product with all of its properties with a single query like in this example:

const product = db.products.findOne({"sku":"DEV1337"}).fetch();
const productTitle = product.title; // "Hackathon T-Shirt"
const vendor = product.vendor; // "DEVSHIRTS"
const amountSizes = product.sizes.length // 6
Enter fullscreen mode Exit fullscreen mode

Super clean and easy to write, isn't it? However, you should keep in mind that query for multiple products will return the product with all of its properties which can become up to 16MB.

Modify embedded data

Let's say one vendor has 1000+ products and you decide to change a vendor's description. Now all 1000+ products have to be changed with the same operation to change it "vendorDescription" that needs to be changed.

db.products.updateMany({vendor: "DEVSHIRTS"}, {$set: {vendorDescription: "DEVSHIRT's new description"}});
Enter fullscreen mode Exit fullscreen mode

Such operations are easy to code but might need a lot of processing power when they need to be done multiple times per minute or per second in a huge database. One solution for this is referencing the data instead of embedding it into the object.

Referencing Data: Work with distributed data sets

You probably know this way of working with data from regular SQL-databases like MySQL. Instead of adding all data to the same document, you just add a reference ID to another entry in another collection like I do here for the vendor:

//a document in the "products" collection:
    "_id" : ObjectId("5e451f4cd249baf1d045a778"),
    "title" : "Hackathon T-Shirt",
    "price" : 3.99,
    "currency" : "USD",
    "description" : "My Awesome Shirt",
    "sku": "DEV1337",
    "createdAt" : ISODate("2021-01-02T10:05:00.610Z"),
    "stock" : 12,
    "sizes": ["xs","s","m","l","xl","xxl"],
    "colors": ["red","green","blue"],
    "vendor" : "23ae11d117baf1d127c99efd334"

//a document in the "vendors" collection:
    "_id" : ObjectId("23ae11d117baf1d127c99efd334"),
    "title" : "DEVSHIRTS",
    "slug": "devshirts",
    "description": "DEVSHIRTS is a fashion label that sells T-Shirts for devs."
Enter fullscreen mode Exit fullscreen mode

Query referenced data

When you query referenced data, you need to write the first query to get the referenced id and a second query to get the object you're looking for:

// Want to get the vendor description of a product...
const product = db.products.findOne({"sku":"DEV1337"}).fetch();
const vendorId = product.vendor; // "23ae11d117baf1d127c99efd334"

const vendor = db.vendors.findOne({"_id":vendorId}).fetch();
const vendorDescription = vendor.description
Enter fullscreen mode Exit fullscreen mode

There's also a way to do this on a database level with the $lookup functionality:

Modify referenced data

This is a big upside when working with referenced data instead of embedded data: You need to update way fewer documents, probably only one. For example, updating the description of a vendor can be done by a very performant single operation now:

db.products.update({vendor: "DEVSHIRTS"}, {$set: {vendorDescription: "DEVSHIRT's new description"}});
Enter fullscreen mode Exit fullscreen mode

Even if you have thousands of products from this vendor in your database, MongoDB will only need to update one single entry.

When to use embedded vs. referenced data

To decide whether you should embed or reference your data is one of the most important aspects when building database schemas.

Referencing data might often look like a "clean" way but when you start to create as many references as possible the amount of code & queries you'll need to write will increase tremendously - think of my product example above with references to sizes, colors, currencies, etc..
Besides that, your database will get bombed with queries and operations that might be needed when you'd just embed the data.
On the other hand, too much embedded data could mean that you query and work with objects that are way bigger than actually needed, which also slows down your app.

Only reference when needed

From my experience, it's best practice to embed as much data as possible and only reference other documents if it's really needed and makes sense for your specific application.

For example, if you only need to attach a few variants with unique attributes to a product, there's no need of creating a "variants" collection for that. On the other hand, if your variants are basically their own products with a lot of attributes (like title, images, sku, prices, etc.) and even get shared across multiple products, it's a better idea to put those in their own "variants" collection so you can query and modify them independently from the products.

Tip: data models can come in handy

Oftentimes it also helps to select the schema based on the data model. While embedded data is usually fine for "One-To-One" and "One-To-Few" relations, referenced data shines for "One-To-Many" and especially for "One-To-VeryMany" relations.

What's next

I hope I could give you a basic understanding of embedded & referenced data and you have an idea about structuring e-commerce data now.
If you want to learn more, MongoDB has quite nice tutorials, guides, and presentations you can have a look at:

Discussion (0)