DEV Community

Discussion on: 📊 Languages' Popularity on DEV

 
rhymes profile image
rhymes

The many-to-many relationships and pivot tables in SQLAlchemy just make so much more sense than in any other ORM I've used. Personally, I've never used Django so have never used its ORM, but I doubt its nearly as good a SQLAlchemy!

Django ORM is based on the Active Record pattern (the same as Rails's ActiveRecord). SQLAlchemy ORM is based on the Data Mapper pattern. ORMs (including the Flask-SQLAlchemy layer) tend to suffer from the object-relational impedence mismatch.

@dmfay explained it well here:

More fully, it's the "object-relational impedance mismatch". This refers to the central problem of using an object-relational mapper (O/RM) framework to handle data access and manipulation. O/RMs require you to implement a parallel data model in your application code: if you have a users table, you'll have to write a User class which represents records in that table, and so forth.

The issue is that relational databases work in a fundamentally different manner from application code. A relational database cares about two things: records, and relationships between records as denoted by foreign key constraints. Application code works with all manner of more complicated data structures and techniques: hierarchies, iteration, graph traversal. So when you design a second, sort-of-but-not-completely parallel data model that works for your application code, it's not so easy to translate that into the existing data model expressed by your database schema.

If your users table (and hence your User model) has a visits field and you realize that you'd been double-counting visits, you might want to amend the data you've collected so far. With SQL, you'd UPDATE users SET visits = visits / 2; -- fairly simple. However, if you're using an O/RM and don't want to write SQL, you'll likely have to load each User into memory, halve its visits, and persist it back. This takes much, much more time and makes two round-trips to the database for each record in the users table. It's much quicker to abandon the O/RM and go with the SQL query, and indeed most O/RMs offer the ability to run raw SQL because the impedance mismatch is not a problem you can beat head-on.

The impedance mismatch is a problem characteristic of O/RMs specifically. There's nothing special about NoSQL databases which eliminates it, and in fact it's quite possible to work with a relational database without running into it: you just have to choose a data access framework which isn't an O/RM. Note that the opposite is also true: if you use an "O/RM" for a NoSQL database, like Mongoose for MongoDB, you can still run into the impedance mismatch. Look into data mappers if you still want to use a relational database -- I highly recommend at least investigating it unless you know that your data architecture requires specialized storage, since relational databases are much more generally useful than NoSQL stores.

Ted Neward's blog post The Vietnam of Computer Science goes into more detail, and if you have a SkillsMatter account I talked about the impedance mismatch in the course of describing why my own project Massive.js isn't an O/RM at FullStack London last week.

Thread Thread
 
kaelscion profile image
kaelscion

So what I'm seeing is that there is a deep trade off with an ORM? Sacrificing scalability, performance, and efficiency for ease of use and convenience?

That seems like a common trade off. The more abstracted we get from what we are doing, the more complex the plumbing becomes. I suppose that should be expected (although admittedly I didn't even think about that). Your doing everything twice. Invoking a class, which mirrors closely, but not exactly, a database table that handles data in a simpler, more straightforward way than the application layer code above it. The class then translates, to the best of its ability, SQL queries that relate to that table. But first, must connect to them, which could have been done much more quickly by straight-up SQL and much earlier in the process. Then, as with all processes that are meant to be easy-to-use, the SQLAlchemy class uses an over-generalized method of connecting, aggregating, and updating data from the db, before then reorganizing it into the form we expect when the round trip completes. Main issue being that in order to account for the large set of potential use cases, error handling, etc that all "convenience based" software code is designed to account for, a fairly straightforward SQL request becomes bloated, slow, and overly complex simply because it is, for lack of a better metaphor, a "Python-SQL Compiler" that takes Python code, translates it into the SQL that SQLite, MySQL, Postges, etc can understand, speaking "broken SQL" at best, then takes the response, translates it back to the code that the Python interpreter is expecting so that the application can use it. All when this could have been done with a standard SQL query that we don't know because ORMs are easier.

Is that about what this is saying? I'm not trying to sound sarcastic, I've just never thought about what was going on under the hood with an ORM and want to make sure I understand what is being explained to me here :D.

Thread Thread
 
rhymes profile image
rhymes

The class then translates, to the best of its ability, SQL queries that relate to that table. But first, must connect to them, which could have been done much more quickly by straight-up SQL and much earlier in the process.

I feel this is not part of the impedence mismatch. ORMs are usually backed by connection pools and lazy connections are definitely a pro (most of the times you don't want your code to connect to the DB until the data is asked for)

Then, as with all processes that are meant to be easy-to-use, the SQLAlchemy class uses an over-generalized method of connecting, aggregating, and updating data from the db, before then reorganizing it into the form we expect when the round trip completes

Sure, there's some overhead, because data has to be translated and converted and objects have to be created and so on

A common (and easily solvable) issue with ORMs is the N+1 which roughly translates to something like select all users, and for each user select their comments, this can be solved by what ORMs call eager loading which translates to something like select all users, join with the comments table and load the comments at the same time

Another common issue with ORMs is the update example Dian gives. If you have to update a single column in bulk the naive and very slow way to do it is to load the objects in memory, iterate through them, change the column and save the object. A better way to do it, in SQLAlchemy, is through bulk methods which are much faster (but not as fast as using the core which basically generates a single SQL UPDATE). In Rails you probably have to use something like activerecord-import

Same goes for the combination of the two issues above (N+1 update? don't know the name :D)

I've just never thought about what was going on under the hood with an ORM and want to make sure I understand what is being explained to me here :D

:-)