Jon Calhoun

Posted on Jul 31, 2017 • Edited on May 6, 2018 • Originally published at calhoun.io

Subtle issues with ORMs, and how to avoid them

#go #database #sql #orm

This post was original posted at calhoun.io

One of the most common change requests I get with my Web Development course is to stop using GORM, and instead use the database/sql package that is part of Go's standard library.

When I receive feedback like this I often respond asking, "Why?"

I ask this not to be snarky, but because I truly want to know what people dislike about GORM. Do they have issue with ORMs, third party libraries, or something else? What is causing them to believe that the database/sql package is a better option?

What is shocking to me is that an overwhelming number of the people who request this change don't actually have a good reason, or provide reasons that don't actually apply to them.

The truth is, I can't really blame them. All it takes is a week or two hanging out on the Go subreddit and you will quickly start to believe that ORMs are clearly the spawn of Satan and should never be used.

In this post I hope to remedy that.

I don't expect to convince everyone who hates ORMs to change their mind, and I don't expect everyone to start using an ORM. Instead, I hope to educate everyone so that they can make an educated decision on their own, and for those that do use ORMs I hope to educate them on some of the issues they should work to avoid.

ORMs can hide complexity

ORMs can make it incredibly easy to write code that masks a great deal of complexity. Normally this is a good thing - we don't want to have to write complex queries on our own all the time - but when these things get chained together to create unexpected queries it can make it hard to design a database that scales efficiently.

Ruby on Rails' Active Record is likely the biggest culprit here, but truthfully this can happen with any ORM. In fact, it could even happen without an ORM, but ORMs tend to mask what is going on just enough to let bad things slip through the cracks.

First, let's discuss what I mean. Imagine you are writing an application and somewhere inside of your code you write some code like this:

# I'm using rails but this applies to Go as well
user.orders.each do |order|
  # ... use the order
end

When this code is run, it will likely end up executing some SQL behind the scenes. While this might not be problematic at first, over time this can become an issue. For example, if you start to chain together different clauses you could end up with something like:

user.orders.not_shipped.high_value.with_issue.each do |order|
  # ... use the order
end

As queries becomes more complex, they are likely going to become less performant. This becomes especially pronounced as we start to do more complex queries, like multiple joins with conditional clauses.

As this happens, the engineers designing the database will have a hard time making decisions because they won't have a clear picture of what is happening. They won't know which columns need indexed, or which joins need optimized unless they scan through the entire application digging for any code that generates SQL queries.

Sure, we could add in some profiling and start to measure which queries are taking the most time, but this will always be reactive. We can't be proactive and avoid potential major issues. We can't dictate what queries should be run versus which shouldn't because they will bring our database to a crawl.

As I said before, this isn't limited to ORMs but it is a bigger issue with ORMs because developers can write complex queries without actually seeing the SQL being generated and thus be ignorant to the fact that they are hurting the performance of the application.

Writing SQL on the other hand doesn't mask this, so when you write a complex query is it hard to say, "Oh, I didn't know!"

And that brings us to the other major issue with ORMs.

ORMs allow developers to remain ignorant

ORMs allow developers to remain ignorant of the tech they are using and how their decisions will impact the overall performance of the application.

In a small application this typically doesn't matter - an SQL database can operate on thousands of records with relative ease. But as an application scales, these issues become more pronounced. Iterating over thousands of records might be okay, but iterating over millions is going to start to take a toll.

One of the primary motivations behind writing pure SQL is that developers have to learn enough SQL to understand the complex queries they are writing, so they should realistically understand how they are going to affect the performance of the application. If they are writing queries that do joins or filter by specific attributes they likely also know enough to figure out which indexes need created to keep things performant.

Avoiding these issues

While both of these issues are possible with ORMs, the truth is neither of them are directly caused by the ORM. Instead, both are caused by poor code design and a lack of education.

The best way to mitigate these issues isn't to stop using an ORM, but is to instead learn better design patterns. To educate your team members and yourself, and to incorporate solid code review so that if a developer unfamiliar with SQL needs to update your code, a developer familiar with the ramifications of their changes can review any database-specific changes.

One way to do this is to create a database layer in your code and to isolate all of your database interactions to this code. This might be one package, or split into several. The important thing here is that the only code that ends up creating SQL statements is contained within the this layer of your application.

package database

type DB struct {
  // ...
}

func (db *DB) UserOrders(user *User) ([]Order, error) {
  // ...
}

By writing code this way, you can clearly separate the database interactions with the rest of your code. You can also easily incorporate better code reviews when code here is changed, allowing multiple developers who are familiar with SQL and your database design to review these changes before they get shipped to production. This also present an opportunity to educate developers who may be less familiar with SQL.

While this approach is more rigid because you need to expose functions for each query you want the rest of your application to have access to, it is significantly easier to test, maintain, and keep efficient over time.

The coolest thing about this approach is that it doesn't matter if you use an ORM or not. In the example above you got an idea of what we are doing, but I didn't have to write any code using the database/sql or gorm pacakge because it simply does not matter. An ORM might be used to aid in building SQL queries, but it doesn't have to. We could even move from using an ORM to using pure SQL without changing any other code in our application.

But ORMs slow down your app!

It is hard to talk about ORMs without someone jumping in and saying, "But ORMs will slow down your application! Use this other library for faster performance."

As Mark Bates puts it, "The Go community loves benchmarks. It is obsessed with them."

The problem with this obsession with speed and benchmarks, as Mark continues to explain in his post, is that very few applications actually need to be as fast as possible. Instead, they simply need to be "fast enough".

End users don't notice the difference between a 31ms and a 30ms response time. Now if ORMs were causing 100ms delays then sure, you could make this argument, but the actual speed cost of using an ORM is negligible in a real application. It will amount to less than 1% of your application's total latency.

Rather than spending too much time here, I suggest you check out Mark's post:

http://www.metabates.com/2017/03/03/youre-benchmarking-the-wrong-thing/

It isn't specifically directed at ORMs or web applications, but the point still stands. For most applications there are other, more important factors to consider than a very minor slowdown caused by using an ORM.

We shouldn't learn something new when we already know SQL!

This is probably the most common reason I hear for avoiding ORMs, and I agree with it. If you or your team already knows SQL and prefer it, then using an ORM is a bad idea.

The problem with this mindset is that it only applies to one group of developers - those that already know SQL very well and prefer to use it.

On the other hand, there is a large group of developers who either (a) DO NOT know SQL very well, or (b) prefer using an ORM or other SQL building library.

Most of the people I teach fall into the first category - they do not know SQL very well. In fact, many of them are learning about web development for the first time, so adding SQL to that already massive list of things to learn isn't likely to turn out well.

Instead, I find that an ORM (or SQL builder) that works similar to raw SQL is a better option. Not only does this help get beginners up and running faster, but it also helps aid them in learning SQL. For example, you can enable logging in GORM to see what SQL query ends up being executive for each piece of code you write, and the code looks very similar to SQL.

db.Where("email = ?", "jon@calhoun.io").First(&user)

Want to know (roughly) what the SQL generated by this is?

SELECT * FROM users WHERE email='jon@calhoun.io' LIMIT 1

As I said, they don't look that different and can be a great tool for learning.

So yes, if you already know SQL and prefer it you shouldn't use an ORM. But this doesn't prove that ORMs are a bad idea. It simply demonstrates that if you know and prefer SQL then you should clearly not use an ORM.

In summary...

If you are new to SQL, or you simply want to use an ORM then go for it. There isn't anything "bad" or "evil" about them. When others tell you they are bad, what they are really expressing is an opinion. A preference, created by their very different educational background or experiences with teams that may or may not have used an ORM effectively.

And if you dislike ORMs, then great. More power to you. But please stop telling everyone that ORMs are awful tools just because you don't prefer them. You are actively making it harder for beginners to get into development by making them believe they need to learn everything you know before they can even get started, and that simply is not possible.

Did you enjoy this article? Join my mailing list!

If you enjoyed this article, please consider joining my mailing list.

I will send you roughly one email every week letting you know about new articles or screencasts (like this one) that I am working on or have published recently. No spam. No selling your emails. Nothing shady - I'll treat your inbox like it was my own.

As a special thank you for joining, I'll also send you both screencast and ebook samples from my course, Web Development with Go.

Top comments (4)

Kasey Speakman • Jul 31 '17 • Edited

I use a "micro-ORM" (Dapper on the .NET platform). Literally the only thing it does for me is map objects to query parameters and query results to objects. It is an incredible help.

I have previously used full ORMs (NHibernate and Entity Framework), and I am not likely to go that route again. Hopefully I can explain my perspective in a cogent way. If I had to boil it down to 2 main reasons that ORMs have not fit for me, they would be: failure of one-true-model and impedance mismatch.

As I work on a system, I find that trying to use one-true-model for an entity (i.e. a single model represented by a set of database tables) does not work. The objects I use in code start no longer to map property-for-property to its storage structure (database tables). Simple example: Full-text search column... business logic could care less about it but UI needs it for queries. Resolving these differences in one-true-model becomes a source of ongoing complexity. At some point I need to mitigate that complexity. The usual way is to split the data model up into separate models organized around their specific concerns. I have to manually decide what data goes into what storage container (and in what shape). I could keep a separate set of storage objects, manually map from business object to storage objects, and then have the ORM auto map from storage object to SQL. But at this point it is more straight-forward to manually map from business object to query parameters without introducing and learning a new framework. The really tedious stuff can be avoided with a helper library with low conceptual overhead (like Dapper mentioned above).

Another case where an ORM seems handy is when you have some operational data that you usually load and save as a whole. Examples: limits, minimums, schedules around a particular entity. In this case the relational model introduces accidental complexity by making you split up sub-entities across multiple tables with parent-child relationships even though they are a conceptual whole. (Also called impedance mismatch.) Instead of using an ORM here, I find that using a document or key-value model often fits better. For instance in Postgres, I would store the whole object in one jsonb field. This is usually not so good for reporting (although Postgres specifically does offer indexing and searching on jsonb data). But I often find this is a lesser concern for the type of data where this pattern fits.

I am sure there exists a use case where a full ORM is a good fit. But most of my development (business systems) does not hit one of those cases. It is not so much that ORMs are bad. Its just that as my development habits have changed, they do not effectively solve a problem I have.

Jon Calhoun • Aug 1 '17 • Edited

It definitely sounds like you are making the right decision for your particular use case, and I mostly prefer ORMs that are fairly minimal. That is, they don't try to do everything for me. Instead they help make a few common use cases easier and get out of my way otherwise.

This post wasn't really intended to say situations like yours are wrong, but rather I think it is a bad idea to just blanket-label ORMs as bad. ORMs themselves aren't bad, but when they are used improperly I have definitely seen things get out of hand very quickly.

A similar metaphor would be a knife - a knife in itself isn't bad. We all use them in our kitchen when cooking and have no issues. But if someone accidentally cuts off a finger how are we all going to react to it?

I think it is safe to assume that most of us won't run around saying "All knives are bad! Stop using them!", yet this appears to be what is happening in some Go communities with regards to ORMs and it is mostly a result of people hurting themselves when using an ORM in a past language like Django, Rails, or something else.

This post was meant to help illustrate what some of those pains are. To point out exactly how people are hurting themselves with ORMs, and how those issues could be avoided.

Erebos Manannán • Jul 31 '17

Using some level of ORM has additional benefits as well, you can migrate and mock various features in an ORM layer much more easily than direct SQL queries scattered all over your application.

Jon Calhoun • Jul 31 '17

You aren't wrong for some apps, but when an app is designed well I tend to find this is less relevant.

Eg if you put your code behind a single isolated layer you can easily mock out that layer for testing. Take the following interface:

type UserDB interface {
  ByID(id uint) (*User, error)
  Create(user *User) error
  // and more
}

And all of your SQL code is hidden in an implementation of that interface, you can create a mock implementation for all of your tests and it doesn't matter if you used SQL or an ORM in development - your mock replaces either.

Where this tends to fall apart is when developers scatter ORM (or SQL) code all around in their application, as is more common in rails, and then you end up with something that is harder to test because literally any piece of code could be calling a method on the ORM directly.

In rails this isn't as bad because it isnt' a typed language, you can monkey patch, and you can inject stubs/mocks in your tests at runtime. In Go this doesn't work, so having a better design upfront is necessary.