This article was originally published a day earlier at https://maximorlov.com/why-storing-files-database-bad-practice/
You're already storing structured data in the database, it seems only logical to store files in there as well.
So you look up how to save files in MongoDB (or PostgreSQL, MySQL, etc.), only to find out that storing files in the database is considered bad practice.
Other web developers are saying that you shouldn't store files in the database, and they'll frown upon you if you ask them how to.
But nobody tells you why...
Is it because it looks bad to have a very long base64 string in the database?
There has to be more to it than that.
By the end of this article, you'll find out why saving files in the database is considered a bad practice.
You will discover common downsides of storing files in the database. In addition, you will learn a different approach to storing files that's more in line with industry best practices.
Storing files in the database slows down overall query performance simply because there is more data transmitted between the application and the database.
Additionally, files use up RAM which is used internally by the database to improve performance. Frequently accessed data is stored in RAM because it provides much faster reads than disk storage.
Querying files likely means the database will end up storing those files into RAM. Servers are typically constrained on the amount of RAM available therefore the database ends up prioritising some data over the other. Data that isn't stored in RAM has to be read from disk which is always much slower than RAM.
In other words, when your database is busy serving files, those resources can't be used to respond to other queries.
Database administrators will nod at this:
The level of knowledge required to maintain a database goes up in proportion to the size of the database.
A larger database is more difficult to maintain than a smaller database. A sure way to rapidly increase the size of your database is to store large files in it.
Backing up a larger database takes more time and results in larger backup files that are slower to move around. Other maintenance tasks also get slower over time and are more likely to fail — restoring from backup, adding indexes, database defragmentation, etc.
Maintenance tasks add a strain to the database while running. This in turn decreases database performance and availability. The longer it takes for a database maintenance task to run, the larger the impact will be on your application and end-users.
In a replica set, a larger database leads to slower replication times and bigger replication delays. This may cause race conditions and synchronisation problems if your application code doesn't handle that well.
To save a file in a database, it often needs to be converted in a way so it can be correctly stored.
If you choose to store the file as text, you might decide to store it in base64 format for example. You'll need to write some logic in your application to convert files to base64 before they're saved into the database. You also need to handle the reverse case — reading from the database and converting from base64 to binary before passing the file on to the client.
Note: Files stored in base64 end up using 33% more space than their original size.
Storing files as binaries in the database has a similar downside — escaping/encoding binary data in your application before sending it off to the database.
In both cases, you end up with an additional layer in your application that needs to be maintained. None of this is difficult or time-consuming, but it adds complexity and becomes another point of failure.
The aforementioned points are, what I think, the most important ones when storing files in the database. Below are a few more, but certainly not all, that are worth mentioning.
Increased costs. Hard disk storage is generally cheap but the equivalent in RAM is significantly more expensive. A larger database will use up more RAM to store indexes and commonly queried data to improve performance. Not to mention a database is already often the largest consumer of RAM in the application stack.
Database limits. MongoDB documents are limited to 16MB in size. If you want to store larger files in MongoDB, you have to use GridFS. In PostgreSQL, columns are limited to 1GB per row. If the file you want to store is larger, you have to use a dedicated table designed for storing large objects.
Aside from the database, you can store files locally or in the cloud. In both cases, it's common to store a path to the file's location in the database.
Storing files on the file system is a popular choice. It's convenient to have files stored locally close to your application code. It makes it easy to write/read files without having to send them over the network to an external provider.
Node.js has the built-in fs module (
fs/promises with async/await) which makes it easy to interact with the file system. On top of that, there are a few convenient libraries such as fs-extra and findit that make common tasks straightforward.
Note: Keep in mind that Platform-as-a-Service providers such as Heroku, AWS Elastic Beanstalk and DigitalOcean App Platform have an ephemeral/short-lived file system. Any files saved locally will disappear after deploying or restarting the application. Therefore you can't rely on the file system to store files that need to persist for a longer time. Use a VPS if you want to store files locally, or store files in the cloud instead.
Cloud storage is the go-to solution for larger applications. It takes away the burden of backups, redundancy, delivery and access control. Small websites can benefit too by using media-focused storage solutions such as Cloudinary (more on that below).
Here are a few commonly used file/media storage solutions by Node.js developers:
AWS S3. Amazon Web Services (AWS) has the largest market share in the cloud industry and their storage solution is called S3. With AWS S3, files are called objects and they are stored in buckets which are unique URL namespaces. S3 doesn't know the concept of folders because the file system is abstracted away. Use the forward-slash (
/) in your file names and you'll get folder-like navigation in the S3 dashboard.
Cloudinary is different from the others in this list because it operates at a higher level. Cloudinary is a media storage solution built on top of AWS S3 and optimised for image/video storage. They offer features such as dynamic media transformations, automatic compression and browser-specific media delivery. Cloudinary is great for when you're building a responsive website and want to serve different image sizes for different devices.
DigitalOcean Spaces. DigitalOcean is known for its vast collection of high-quality resources. Anything related to DevOps, they've got you covered. I like them for their clean and intuitive UI. DigitalOcean Spaces is compatible with AWS S3 therefore they have a similar concept of objects and buckets/spaces.
Backblaze B2 is similar to AWS S3 but several times cheaper. They're known for writing robust software centred on security and encryption. Backblaze can also continuously backup your laptop in the cloud which I've been happily using for many years.
No. The answer is, as always, it depends.
On a small scale, the downsides of storing files in the database may not be that important. When you're working on a quick prototype to show to a client, a side project for learning purposes, or an application with a slow growth projection — it's more important to deliver quickly and focus on the must-have features. The scale you're optimising for might never arrive in the future.
You've learned what the common downsides of storing files in the database are:
- Slower database queries
- Database maintenance becomes more difficult
- Storing and serving files is more complex
Alternatively, you can store files on the file system or in the cloud which is more in line with industry best practices. AWS S3, DigitalOcean Spaces, Cloudinary and Backblaze B2 are popular cloud platforms in the Node.js community.
However, that isn't to say that you should never store files in the database. Sometimes you have to ship something quickly, and for small applications, these downsides can be manageable.
Now it's your turn — where will you store the files for your next application? In the cloud, on the file system or in the database?
Whatever you decide, you're now equipped with the knowledge to make a wiser decision than you were before.
Use this FREE request parsing guide and implement seamless working Node.js APIs that follow the latest best practices.