DEV Community

Cover image for MongoDB Breaking the Chains
Peter Harrison
Peter Harrison

Posted on • Edited on

MongoDB Breaking the Chains

Some have been critical of MongoDB for having no schema. If you are developing applications the traditional way, binding your application directly to a fixed data structure burnt into the binary, such a critique makes sense. Why would you want any old thing being inserted into your rows? You want some certainty about what will be in each row after all.

Some may be under the impression that this is in fact the only way to do software, or that it is at least the proper or professional way.

I have made the case in the past that coding to the domain makes our software less flexible, less general, and less reusable.

Think about how flexible Excel is in terms of what can be achieved with it, despite its limitations. How have spreadsheets remained so popular in the face of more polished web applications with SQL databases to store their data?

The answer is that Excel gives the power to the user. The spreadsheet is defined by the user, not the developer. Often a business analyst will develop a spreadsheet and then distribute it with locked cells for others to use. There is no schema burnt into the Excel binary which limits it to storing only certain information.

MongoDB to the Rescue

MongoDB is a schemaless database system. Unlike SQL databases where tables with specific fields must be defined MongoDB allows you to store pretty much any field in any document. Where a SQL database has tables of identical rows MongoDB has collections of documents where each document could contain any data.

Where a SQL database has flat tables where there are specific fields MongoDB's documents can contain structured data where there are lists and maps of separate elements in a tree structure.

The critical difference between SQL databases and MongoDB is that like Excel this ability to store unstructured data allows us to use MongoDB for applications that are similar to Excel in the sense that the user can define data structures at runtime rather than developers defining them at design time.

What might seem a inconsequential or even useless feature to traditional SQL developers turns out to support capabilities that cleave the domain away from developers and put the power in the hands of users.

MongoDB isn't rescuing software developers, it is empowering the user, the manager, the business analyst. It means we can deliver software that breaks from domains and instead allows us to hand back that power to users.

But my database does that!

Some may be screaming at the screen "But my database does that as well!" And it is true. SQL databases have been introducing the ability to store structured data in the form of JSON. It is perfectly possible that the same thing can now be accomplished with other databases. So much the better.

My message here isn't that MongoDB is the exclusive solution to this kind of architectural approach, but rather it is one that I have put to the test and found to be an excellent fit. MongoDB is not the droid you are looking for if you just want a drop in SQL database. It has it's own query language which has no one to one translation to SQL.

If your need is a high speed transaction store where each record will always be identical a SQL database is probably the best fit. As a mentor once told me, horses for courses, or the right tool for the job.

But for the kind of flexible data storage and query that is now typical of the applications I build MongoDB is now my weapon of choice.

Magical Aggregations

Perhaps the most compelling feature of MongoDB for me is the Aggregation system. You could compare this feature to SQL queries, and you certainly use them for queries and grouping. It is however far more flexible than SQL queries, being more of a data transformation language which allows you to translate data into the structure your application needs to consume.

This feature is used for almost all data access in the systems I write, from simple queries to get flat data to multi-dimensional grouping and complex statistical analysis.

In this respect MongoDB exceeded my expectations in terms of capabilities.

But MongoDB does have Schemas!

To satisfy the traditionalists who have failed to appreciate the power of MongoDB and need the training wheels MongoDB has the ability to add schema validations to documents. This means that if you really want you can define what fields should be in each collection. If you end up defining schema this way you should probably think about whether MongoDB is the right fit.

Are there any downsides?

MongoDB sells itself as a scalable database. In theory you can run multiple nodes to scale up. However there are some substantial fish hooks if you do.

Before I talk about scaling I should point out that it is possible to set up replication where there are other passive backup servers with only three servers. If you want reliability and backup this is a pretty good approach. But I'm about to talk about scaling up active nodes.

When scaling up the number of active nodes you will need some controller servers in addition to the actual nodes. The number of machines in the cluster go from one to a minimum of nine, probably more if you want replication for redundancy. There is therefore a huge gap between running say a single MongoDB instance or even one Active node and two passive servers and a cluster of multiple active servers.

The second issue is that the aggregations mentioned above won't run properly over data stored across multiple nodes. Therefore an application written and working on one MongoDB will not work when scaled up. Sadly I only found this out after extensive use of aggregations.

Of course, it may be possible to work around the limitations, but it was a bit of a surprise for us when we learned of it.

Conclusion

It is a mistake to think of MondoDB as just another type of database. If you play to its strengths it is an excellent system. Importantly if you adopt the philosophy of low domain binding you will be able to deliver software which is far more reusable and extensible at runtime.

Top comments (0)