DEV Community

Mahesh K
Mahesh K

Posted on


Explain Object Storage like I'm Five

Many hosting companies make use of the term object storage. What is object storage and how it is different from traditional database or file storage?

Latest comments (5)

joshualjohnson profile image
Joshua Johnson

I've built an entire library around this problem

"Like I'm 5"

I want to store my toys in a toy box.

Traditional databases require that I separate toys into components (tables) and join them later. I do not want to duplicate data. For example, a car has 4 tires that are the same. I might create a tires table and store those tires there and make sure I do not DUPLICATE a tire. This is called normalization. When I want to find out about the car, the tires are added back to the car when I call for the car out of the toy box.

Object storage databases doesn't care about duplication within the toy box, or what type of toy it is for that matter. Toys are toys...What the object storages databases do is index the data about he objects in the database. That way you can recall all the toys that have tires.

rhymes profile image

I admit I can't find the right analogy to explain files, dbs and object storage right now so I'll drop the "like I'm five" part but still keep it simple.

A file storage (file system?) is an architecture in which information is memorized in a hierarchy. A file system is responsible to keep track of where every file is and what bytes stored on the disk make up such file (there are virtual file systems as well but let's simplify :D). Files have a name, belong to a directory and have metadata (like the inode on Linux). For example the hierarchy is the reason why you can have different files with the same name stored on a file system.

A "traditional" database (I guess you're referring to relational databases) is an abstraction on the file storage (the data you store in the database is somewhere in the file system for obvious reasons) and in most cases also the RAM. The key part here is the relational model on which they are based (there are many types of databases that are not relational by the way). The RDBMS generally have data structured in tables which groups set of rows for in whose columns you store the data attributes. The data stored in tables is persisted somewhere in the file system. As a user of a RDBMS you do not bother yourself with how the data is stored on disk, just how to manipulate it and query it using a query language (SQL in most cases)

So until now we've seen two abstractions on bits and bytes: a file which is a container of (usually) a single document with a name and a place in the file system and tables which contains rows of data that can be put in relations which each other to do operations that would be pretty hard by just using the file system.

A third type of storage is the object storage. Object storages allow the user to store information as an object and its metadata (plus a unique identifier). The difference between file storage and object storage are that the latter is not hierarchical, there's no limit in the size of the metadata (which is defined by you, not by the file system). This structure makes scalability and replication easier so it's more suited than a file system to be distributed among multiple machines. Some object storages also have additional features like granular security (file systems do too :D), APIs to access the objects (file systems do too but a lower level) and easy replication (file systems do too :-D).

On the surface file and object storage have various thing in common but the key differences, from a user's perspective, are the hierarchical vs flat architecture, how they handle metadata and the APIs.

S3 for example is a flat structure, they use "paths" to give an illusion of a hierarchical structure but in reality it's flat. It's just that path/to/abc.pdf in a file system means that path and to are directories, in S3 the actual file name (key in S3 lingo) is path/to/abc.pdf

Hope this helps a bit :-) If you have other questions, fire away!

maheshkay profile image
Mahesh K

Thank you learned something new :)

So how do developers decide which type of storage is required at what context in project?

rhymes profile image

If you have more than instance of a server you probably end up storing common data in object storage because it's distributed and you can access it from wherever. Think about it: if you store "filea.txt" in the file system of the server the only way you can get to it is if you connect to that machine somehow, download it on your computer and then upload it somewhere else you might need it.

With a distributed object storage you just need the path and you can access it from wherever.

The file system in a cloud context is useful for temporary local data: let's say you're writing an image processing system, you are interested only in the last artifact, not the steps in the middle. Another use for a file system is to actually store the files of an application server you want to install manually, like on your computer :-)

A database gives you querying capabilties, the ability to join related and unrelated data, data typing and validation, the whole ACID set of features and so on. Not all databases have all the features and some are better than others depending on what you need.

Thread Thread
maheshkay profile image
Mahesh K

Thanks a lot for the explanation :)

Timeless DEV post...

Git Concepts I Wish I Knew Years Ago

The most used technology by developers is not Javascript.

It's not Python or HTML.

It hardly even gets mentioned in interviews or listed as a pre-requisite for jobs.

I'm talking about Git and version control of course.

One does not simply learn git