loading...

Imagining a better EntityFramework

jpeg729 profile image jpeg729 ・5 min read

The quest for the perfect architecture (4 Part Series)

1) The quest for the perfect software architecture 2) Imagining a better EntityFramework 3) Using NoSQL as a cache for views 4) AutoMapper: converting entities to views at runtime

There is a lot to like in EntityFramework... and some less desirable parts.

The good

Automatic translation of C# Linq queries to SQL for efficient querying

For example, the following asks the database to filter the books rather than loading them all and discarding the books by other authors.

var books = ctx.Books
    .Where(b => b.Author == "John Smith").ToList();

Using anonymous objects to select only the data you need

For example, the following only fetches the Authors.Name and Books.Title columns.

var authorNamesAndBookTitles = ctx.Authors
    .Select(a => new 
    { 
        Name = a.Name, 
        Books = a.Books.Select(b => b.Title) 
    });

Using Automapper's ProjectTo to load only the data that you want

For example, if AuthorDto contains only a subset of the properties of Author, the following will only fetch the required columns.

Mapper.CreateMap<Author, AuthorDto>();

var authorsAndBooksDtos = db.Employees
    .ProjectTo<AuthorDto>().ToList();

Explicit joins for (mostly efficient) loading of related data

This will load all the authors along with the books each one has written.

var authorsWithBooks = ctx.Authors
    .Include(a => a.Books).ToList();

Automatic change tracking

Just load some data, modify it, and then call SaveChanges() and all the changes get saved.

var authorWithBooks = ctx.Authors
    .Include(a => a.Books)
    .Where(b => b.Id == authorId).First();

authorWithBooks.Name = "new name";
authorWithBooks.Add(new Book { Title = "book title" });

ctx.SaveChanges();

Automated generation of code-first migrations

Just change your entity classes, run Add-Migration NameOfMigration in the console, and a new migration class gets generated.

The not so good

You can't mix returning anonymous objects and entity types

For example, the following won't work.

var authorNamesAndBooks = ctx.Authors
    .Select(a => new 
    { 
        Name = a.Name, 
        Books = a.Books
    });

Change tracking

Coding a feature usually involves loading some data, verifying the state of the data, modifying the data, and maybe sending an email or generating a document. Change tracking makes this pretty easy.

But if the verification part is done by a function written by a less experienced developer, you just have to hope that it doesn't modify the data. For example, the following is valid code...

if (author.IsActive = true) { ... }

but it mistakenly assigns true to author.IsActive, and this change will be persisted.

Explicit joins are inefficient when loading data from many related tables

Unfortunately, when you include data from many related tables the query that is generated is not particularly efficient. Loading the related entities separately can often be quicker.

EntityFramework.Plus provides IncludeOptimized to help with this scenario, but doesn't work with many-to-many relationships.

Related entities can be included unexpectedly even with AsNoTracking()

For example, this loads an author with his books and nothing else right?

var authorAndBooks = ctx.Authors
    .Where(a => a.Name == "John Smith")
    .Include(a => a.Books).ToList();

So why is authorAndBooks.First().Author not null?

My guess is that as EntityFramework had loaded the author into memory, it realised it could set the Author property of each book correctly from the data in cache.

This could be problematic when you want to send a minimal set of data over the wire.

Migrations

EntityFramework's Add-Migration command compares the current state of the database, the snapshot of the database shape stored in the most recent migration, and the shape of your entity classes.

Hence, if you add migrations to two independent branches and then merge the result EntityFramework assumes that the least recent migration hasn't been generated nor applied. The fix is easy but tedious, you have to generate a new migration using the -IgnoreChanges option in order to tell it that the current database state is good.

Saving untracked entities is fraught with danger

For example, the following code duplicates the user.

var untrackedUser = ctx.Users.First(u => u.Id == userId).AsNoTracking();
var book = new Book 
{ 
    Title = "new book title",
    Author = untrackedUser,
};

ctx.Books.Add(book);
ctx.SaveChanges();

Untracked entities can come from many sources, from data sent from a client app, from a NoSQL cache, from a query with AsNoTracking, etc.

Might there be a better way...?

Here are a few ideas?

Migrations

An upgrade to EntityFramework Core would solve the problem, but would exchange it for another. Namely, many-to-many joins are not supported. In EntityFramework 6 you simply need ICollection properties in both classes, for example...

class User 
{
    public string Name { get; set; }
    public ICollection<OnlineCourse> OnlineCourses { get; set; }
}

class OnlineCourse
{
    public string Title { get; set; }
    public ICollection<User> Users { get; set; }
}

whereas in EntityFramework Core you need to add an Enrolment entity in between them, and link Users to their Enrolments rather than directly to their OnlineCourses.

class Enrolment
{
    public User User { get; set; }
    public OnlineCourse OnlineCourse { get; set; }
}

This might not be such a bad thing as it makes things more explicit. On the other hand, merging migrations is an infrequent hassle and definitely not a showstopper for us.

Imagining a new API for writing data to the database

EntityFramework.Plus provides a bulk update API that allows you to modify properties of a an entity without first loading the entities. Here is one of their examples...

// UPDATE all users inactive for 2 years
var date = DateTime.Now.AddYears(-2);
ctx.Users.Where(x => x.LastLoginDate < date)
         .Update(x => new User() { IsSoftDeleted = 1 });

However this API does not support linked entities.

Could we create a similar API that did support linked entities? Here are a few use cases that I would like it to support...

  • Setting a singly linked property without modifying other properties of the linked entity
  • Setting a singly linked property and simultaneously modifying other properties of the linked entity
  • Creating a new entity and adding it to a list of linked entities on another entity
  • Adding an existing entity to a list of linked entities on another entity, with or without modifying some of its properties.
  • Removing an item from a list of linked entities without necessarily deleting the linked item

Maybe a call to our imaginary ExplicitUpdate method could return a list of modified properties with their old and new values...

More details to follow...

What about other ORM's?

Dapper is a well known alternative to EntityFramework with a reputation for being fast. But it is only a micro-orm which means that you don't get migrations and you have to write most of your SQL queries by hand. Now I know that I can write SQL queries that are less efficient than those that EntityFramework writes, and I am sure that I would spend weeks rewriting our queries if we decided to move to Dapper.

The thing micro-orm's don't give you is easy migrations. So we would need to find another system for migrations and use T4 to regenerate our entity classes every time.

Micro-ORM's such as Dapper, PetaPoco, linq2db, and many others seem like decent projects, but I can't see us porting our big existing app to any of them in a hurry.

Insight.Database seems to be a slightly more interesting alternative.

Yet another option would be to hide the database access code in an F# project and use one of their SQL Server type providers. Instead of generating model classes using T4, the compiler connects to the database and generates the required classes on the fly. We could still use EntityFramework for legacy code and for migrations, and new code could use a new data access method with very little overhead.

But still, our best bet seems to be to put up with the migrations issue and to add a new API like bulk update on top of EntityFramework.

Conclusion

Immutable data and/or an explicit write API seems to be a sensible way forward for more reliable and testable software.

The quest for the perfect architecture (4 Part Series)

1) The quest for the perfect software architecture 2) Imagining a better EntityFramework 3) Using NoSQL as a cache for views 4) AutoMapper: converting entities to views at runtime

Posted on by:

Discussion

markdown guide
 

I'm interested in understanding what you mean with migration problems with Dapper. I can't offer an opinion for Entity since I've never used it before, I've only used Dapper. Structuring your objects that will hold the database data and having the dapper functions in a .net standard library would allow you to migrate to a desktop,web,or any environment. Do you mind explaining what you mean by migrate?

 

In this case, migrations build/update a database instance based on a set of model classes in code. For example, if you added an Age property to a PersonModel.cs class, you could call 'Add-Migration AddPersonAge' in powershell, which would generate something like AddPersonAge_timestamp.cs to your project's Migrations directory. Calling Update-Database will apply the migration to your database (add an age field to the Person table).

 

EntityFramework support migration-driven database delivery as described in this article:
enterprisecraftsmanship.com/2015/0...

 

EF is like a half way from a simple object mapper like Dapper to a full-fledged ORM like NHibernate. You should read this post:
enterprisecraftsmanship.com/2018/0...

Some of your point is for a simple object mapper, it won't work in DDD perspective. For example (a as Author).Books[i].Author should be loaded and be the same with the instance 'a'. That's the correct behavior. You should separate 2 use cases here:

  1. Retrieve object to manipulate it during a business transaction: Object's properties should be fully-loaded or lazy-loaded in order to ensure the business constraints.
  2. Return the object to the presentation: it should be projected to a data transfer object to the outside world (docs.automapper.org/en/stable/Quer...).
 
  1. I disagree. In our code base we have disabled lazy loading for performance reasons, so we spend our time checking whether we have explicitly included everything we need. I this context the fact that EF sometimes includes entities that are not explicitly included is an unwelcome surprise.

  2. I believe that last time I tried using Automapper's ProjectTo I had a similar issue, even though I had correctly specified Explicit Expansion only. I am working on a minimal reproduction and if I can reproduce it I will file a bug.

 

Lazy load will work best if you work with single object, i.e. in business transaction. When you retrieve object to presentation layer, Linq projection will help you to avoid lazy loading, with or without Automapper. You don't have to use Automapper to have this ability. Automapper just reduces duplicated mapping in your code. If Automapper has bug, you can raise it (although I haven't seen any bug yet).

But in my experience, I also got some glitch with EF Core Lazy loading so I didn't use it too. I see NHibernate solving my problem better.

 

"For example, the following is valid code... but it mistakenly assigns..."

Huh. I always thought using = was a compile error but apparently it's just a warning for bools.

 

This article highlights a lot of issues that are caused by bad architecture rather than EF. You can overcome every issue here by designing your code base well.

 

Could you elaborate?
I am eager to learn anything and everything that might help on this journey.

 

Of course, more than happy to help. I'm out at the moment but I'll chip in tomorrow 😊

 

A great discussion, however what's the End Game's Objective Possibilities and Procedures Process!!!
Wiltel49@gmail.com
AAS ITI MICHIGAN C#2018!!!

 

Currently I work on a largish project that isn't very well architectured, at least not any more.

I want to add a new architecture alongside the old one in order to start afresh without starting over. Then we can slowly port everything over from the old architecture.

As yet, I don't know what the new architecture might look like, so I am starting out by layout out the good and the bad in our current code base.

 

The code first mapping work is boring, tedious and confusing. Would be nice to bring back the mapping tool somehow.