DEV Community

Nelson Ciofi
Nelson Ciofi

Posted on

The best way to store and read relational data with Dapper

To start with, I need to make something very clear: I’m “lazy”. Knowing that, I need to reveal a secret; if there is one thing I don’t like to do is write query or commands in sql. Usually the structures are all beautiful and wonderful in my imagination, and when I need to put them in the code, with a lot of “wheres” and “joins”, I already feel dizzy. Because of that, I have been a very assiduous user of ChatGpt, which has helped me a lot by doing the boring part.
And what does all this have to do with this article? I started to explore different ways of storing and retrieving data from complex objects or relationships using C#, Dapper and Sql Server, and I will portray the very summarized summary of these experiences.

Objects and Relationships

To start these tests, we need some objects that have at least one relationship of one to many between them, after all, the goal of this study is to find the most efficient way to bring this relationship assembled at runtime. Currently in my day-to-day I work a lot with the financial part, so I suggest an account and its installments. Like this:

public class Account
{
    public Guid Id { get; set; }
    public string Description { get; set; } 
    public decimal TotalValue { get; set; }
    public List<Installment> Installments { get; set; } 
}

public class Installment
{
    public Guid Id { get; set; }
    public DateTime DueDate { get; set; }
    public decimal Value { get; set; }
    public Guid AccountId { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

It is very common that objects in this configuration are stored in a relational database each in its table with foreign keys serving as reference, creating the relationship. Here is the example of SQL that I used:

 CREATE TABLE Accounts
  (
     Id          UNIQUEIDENTIFIER PRIMARY KEY,
     Description NVARCHAR(255) NOT NULL,
     TotalValue  DECIMAL(18, 2) NOT NULL
  ); 

CREATE TABLE Installments
  (
     Id        UNIQUEIDENTIFIER PRIMARY KEY,
     DueDate   DATETIME NOT NULL,
     Value     DECIMAL(18, 2) NOT NULL,
     AccountId UNIQUEIDENTIFIER NOT NULL,
     FOREIGN KEY(AccountId) REFERENCES Accounts(Id)
  ); 
Enter fullscreen mode Exit fullscreen mode

Most common approaches

The most common way I have come across to retrieve this type of information is using a single sql query with join and then performing a mapping, which in my case is from installments to their accounts.

public const string ClassicQuery = @"
    SELECT acc.*, i.*  FROM Accounts acc
    inner join Installments i ON i.AccountId = acc.Id";

public List<Account> GetAllAccounts()
{
    var lookup = new Dictionary<Guid, Account>();

    _ = sqlConnection.Query<Account, Installment, Account>(SqlConstants.ClassicQuery,
        (acc, ins) =>
        {
            if (!lookup.TryGetValue(acc.Id, out var accEntry))
            {
                accEntry = acc;
                accEntry.Installments ??= new List<Installment>();
                lookup.Add(acc.Id, accEntry);
            }

            accEntry.Installments.Add(ins);
            return accEntry;

        }, splitOn: "Id");

    return lookup.Values.AsList();
}
Enter fullscreen mode Exit fullscreen mode

Brief explanations:

  1. For those who are already used to how sql works, know that this type of query generates the cartesian product of the tables, so it is necessary to use a 'lookup' during the mapping. Try to do without it and you'll end up with duplicate accounts and one-to-one relationships.
  2. I'm not absolutely sure, but I believe that the C# compiler is capable of realizing that a hardcoded string can be optimized as constant or static, but I preferred to make it explicit so there are no doubts left, and since we will have benchmark tests addition soon, I don't want string allocations to mess up the results.
  3. The AsList() method comes from Dapper and considers that, if an enumerable is already implemented as a List<T>, it reuses that list instead of allocating another one, as would happen if using ToList().

Another very common way is to use the .Read<T>() command multiple times over the result of a .QueryMultiple(). Like this:

public const string MultipleQuery = @"
    SELECT * FROM Accounts;
    SELECT * FROM Installments;";

public List<Account> GetAllAccounts()
{
    var res = sqlConnection.QueryMultiple(SqlConstants.MultipleQuery);

    var acc = res.Read<Account>();
    var installments = res.Read<Installment>();

    foreach (var account in acc)
    {
        account.Installments = installments.Where(i => i.AccountId == account.Id).ToList();
    }

    return acc.AsList();
}
Enter fullscreen mode Exit fullscreen mode

The code is very simple and straight forward, it doesn't have any magic. So far, I've never questioned the performance, usability, or readability of this, and in fact, in the real world, there's never been a requirement to try to improve anything along these lines. But since I'm 'lazy' I started to get sick of writing this kind of code several times.

What if...Json?

For those who have already ventured with non-relational databases, such as MongoDB or Azure Cosmos, you know that you can store this entire object as a big and beautiful json. And if we analyze it coldly, we can try to mix a little json in the relational database, why not, right? All mainstream engines, including Sql Server, already have guidelines for dealing with json in tables, be it listing, inserting, filtering, and several other common activities. In the code snippet below, I represent the one-to-many relationship in json and use a custom Dapper mapper to automatically read and write account installments in that format.

public const string CreateAccountsJson = @"
    CREATE TABLE AccountsJson(
        Id UNIQUEIDENTIFIER PRIMARY KEY,
        Description NVARCHAR(255) NOT NULL,
        TotalValue DECIMAL(18,2) NOT NULL,
        Installments NVARCHAR(MAX) NULL
    );";


public class AccountJson
{
    public Guid Id { get; set; }
    public string Description { get; set; } 
    public decimal TotalValue { get; set; }
    public List<InstallmentJson> Installments { get; set; } 
}

public struct InstallmentJson
{
    public DateTime DueDate { get; set; }
    public decimal Value { get; set; }
}


public class InstallmentJsonTypeMapper 
    : SqlMapper.TypeHandler<List<InstallmentJson>>
{
    public override List<InstallmentJson> Parse(object value)
    {
        if (value is null) return new List<InstallmentJson>();

        var json = value.ToString();

        if (string.IsNullOrWhiteSpace(json)) return new List<InstallmentJson>();

        var res = JsonSerializer.Deserialize<List<InstallmentJson>>(json);

        if (res is null) return new List<InstallmentJson>();

        return res;
    }

    public override void SetValue(IDbDataParameter parameter, 
                                  List<InstallmentJson> value)
    {
        parameter.Value = value is null ? null : JsonSerializer.Serialize(value);
    }
}
Enter fullscreen mode Exit fullscreen mode

I do know it works, but is it worth? Can you use it in production and feel safe? Let's go for a benchmark, using BenchmarkDotNet. I seeded a local database with 10 accounts with 10 installments each. And the results were:

|     Method      | Accounts | Installments |   Mean    |  Gen0   |  Gen1  | Allocated  |
|-----------------|----------|--------------|----------:|--------:|-------:|-----------:|
|  ClassicQuery   |    10    |      1       | 153.5 us |  1.4648 |   -    |   9.38 KB  |
| MultipleQuery   |    10    |      1       | 135.7 us |  1.4648 |   -    |  10.16 KB  |
|    JsonQuery    |    10    |      1       | 146.0 us |  2.1973 |   -    |  14.51 KB  |
|  ClassicQuery   |    10    |      5       | 287.4 us |  3.9063 |   -    |  26.34 KB  |
| MultipleQuery   |    10    |      5       | 162.9 us |  2.9297 |   -    |  19.32 KB  |
|    JsonQuery    |    10    |      5       | 182.8 us |  5.1270 |   -    |  32.47 KB  |
|  ClassicQuery   |    10    |     10       | 376.6 us |  7.8125 |   -    |  48.88 KB  |
| MultipleQuery   |    10    |     10       | 204.1 us |  4.8828 |   -    |  31.19 KB  |
|    JsonQuery    |    10    |     10       | 214.3 us |  9.0332 | 0.2441 |  56.28 KB  |

Enter fullscreen mode Exit fullscreen mode

Honestly, I was expecting something a lot worse for the json method. The memory allocation was higher than the others but the execution time matched QueryMultiple(). For me this is already becoming a good deal.
We can also stop considering the classical approach. When considering between allocation and time, she was the worst.

What if... data structures?

Did you notice a prank? The MultipleQuery test is using Linq and a very simplistic way to map the installments to its accounts. We could use a lookup trick similar to the one used in the classic approach. Look:

public List<Account> GetAllAccounts()
    {
        var res = sqlConnection.QueryMultiple(SqlConstants.MultipleQuery);

        var accounts = res.Read<Account>(buffered: true).AsList();
        var installments = res.Read<Installment>();

        var lookup = new Dictionary<Guid, int>();

        for (int i = 0; i < accounts.Count; i++)
        {
            lookup.Add(accounts[i].Id, i);
        }

        foreach (var installment in installments)
        {
            if (lookup.TryGetValue(installment.AccountId, out int i))
            {
                accounts[i].Installments.Add(installment);
            }
        }

        return accounts;
    }
Enter fullscreen mode Exit fullscreen mode

If the data structures are really a thing, this version will be the best so far, so let's get to the tests.

|       Method       | Accounts | Installments |     Mean     |   Allocated  |
|:------------------:|:--------:|:------------:|------------:|------------:|
|   MultipleQuery    |     1    |      10      |    129.8 us |     6.18 KB  |
|     JsonQuery      |     1    |      10      |    121.9 us |     7.63 KB  |
|    MultipleLookup  |     1    |      10      |    121.8 us |     6.22 KB  |
|   MultipleQuery    |     1    |     100      |    188.2 us |     26.7 KB  |
|     JsonQuery      |     1    |     100      |    204.3 us |    48.57 KB  |
|    MultipleLookup  |     1    |     100      |    190.7 us |    26.74 KB  |
|   MultipleQuery    |    10    |      10      |    223.5 us |    31.19 KB  |
|     JsonQuery      |    10    |      10      |    236.4 us |    56.34 KB  |
|    MultipleLookup  |    10    |      10      |    197.2 us |    30.38 KB  |
|   MultipleQuery    |    10    |     100      |    720.0 us |   232.26 KB  |
|     JsonQuery      |    10    |     100      |  1,256.1 us |   465.83 KB  |
|    MultipleLookup  |    10    |     100      |    719.4 us |   231.43 KB  |
|   MultipleQuery    |   100    |      10      |  1,318.5 us |   276.86 KB  |
|     JsonQuery      |   100    |      10      |  1,244.9 us |   541.36 KB  |
|    MultipleLookup  |   100    |      10      |    751.1 us |   269.34 KB  |
|   MultipleQuery    |   100    |     100      | 13,786.4 us |  2386.42 KB  |
|     JsonQuery      |   100    |     100      | 12,036.7 us |  4635.98 KB  |
|    MultipleLookup  |   100    |     100      |  8,879.9 us |  2379.31 KB  |

Enter fullscreen mode Exit fullscreen mode

To help read the results, we need to understand that, although numbers like a hundred installments per account is not a very realistic case, we can consider that in another context an object can have a hundred children in its hierarchy. And that's what makes the test more interesting.
In general, the MultipleLookup method shined, and that brings me a certain sadness, because I have to accept that I need to worry more about sql to guarantee a better performance.
But like all good 'lazy' people, I don't give up easily, I keep trying to always do as little as possible.

What if...SPANS?

It's been a while since Span<T> has been used to optimize routines someway, and what I'm going to propose here is a sinister approach that is totally unusual but can pay off. Our relational case study objects are represented by a collection, and therefore eligible to become a Span<T>.
The question that remains then is how to convert a set of installments into a Span of installments and save it in the database, and the answer to that may be in binary data, see:

public const string CreateAccountsSpan = @"
      CREATE TABLE AccountsSpan(
          Id UNIQUEIDENTIFIER PRIMARY KEY,
          Description NVARCHAR(255) NOT NULL,
          TotalValue DECIMAL(18,2) NOT NULL,
          Installments VARBINARY(MAX) NULL
      );";

public class InstallmentSpanTypeMapper 
    : SqlMapper.TypeHandler<List<InstallmentSpan>>
{
    public override List<InstallmentSpan> Parse(object value)
    {
        if (value is not byte[] bytes)
        {
            return new List<InstallmentSpan>();
        }

        var span = bytes.AsSpan();
        var structSpan = MemoryMarshal.Cast<byte, InstallmentSpan>(span);
        return  structSpan.ToArray().ToList();       
    }

    public override void SetValue(IDbDataParameter parameter,
                                  List<InstallmentSpan> value)
    {
        var s = CollectionsMarshal.AsSpan(value);
        Span<byte> span = MemoryMarshal.AsBytes(s);
        parameter.Value = span.ToArray();
    }
}
Enter fullscreen mode Exit fullscreen mode

It's weird to say the least, but it works. If the collection of installments in the account were an Installment[] instead of a List<Installment> it would be possible to avoid the .ToArray().ToList(). There must be some way to convert directly to list, but I'm lazy, remember?
Shall we go to the tests?

|         Method | Accounts | Installments |        Mean |  Allocated |
|--------------- |--------- |------------- |------------:|-----------:|
|      JsonQuery |        1 |           10 |    127.2 us |    7.64 KB |
|      SpanQuery |        1 |           10 |    107.7 us |    3.55 KB |
| MultipleLookup |        1 |           10 |    124.9 us |    6.23 KB |
|      JsonQuery |        1 |          100 |    205.9 us |   48.57 KB |
|      SpanQuery |        1 |          100 |    113.9 us |      12 KB |
| MultipleLookup |        1 |          100 |    188.0 us |   26.73 KB |
|      JsonQuery |       10 |           10 |    211.9 us |    56.2 KB |
|      SpanQuery |       10 |           10 |    124.5 us |   15.34 KB |
| MultipleLookup |       10 |           10 |    197.9 us |   30.45 KB |
|      JsonQuery |       10 |          100 |  1,237.3 us |  465.74 KB |
|      SpanQuery |       10 |          100 |    192.0 us |   99.84 KB |
| MultipleLookup |       10 |          100 |    674.6 us |  231.42 KB |
|      JsonQuery |      100 |           10 |  1,207.0 us |  541.52 KB |
|      SpanQuery |      100 |           10 |    268.6 us |  132.97 KB |
| MultipleLookup |      100 |           10 |    747.2 us |  269.28 KB |
|      JsonQuery |      100 |          100 | 11,859.5 us | 4636.01 KB |
|      SpanQuery |      100 |          100 |  1,234.6 us |  976.76 KB |
| MultipleLookup |      100 |          100 |  9,424.9 us | 2379.24 KB |
Enter fullscreen mode Exit fullscreen mode

Yes! It is very fast and allocates much less memory than the others, but it has a disadvantage, if someone needs to read the database through a tool like Sql Server Management Studio, he won't be able to see what is there and if he needs some filter, it also won't be possible (lazy thinking).

Conclusion

I really liked the Span<T> version because it simplifies the database query. On the other hand, I don't know to what extent it would be really useful in a real case scenario, with real objects that have many more properties or even with a deep nested hierarchy. It was worth investigating because now it's more of an open door. The version with json is practical, in more recent versions of Sql Server it is already possible to query and manipulate json directly in the database, so I think it is valid in cases where a very complex object needs to be stored and will always be used as a whole thing and there is no effort to add a NoSql database to the team stack. In the end, we can reflect on the fact that we don't normally write a new table and new repository in the code every hour, so my suggestion is to stick with QueryMultiple, apply a good data structure to each case when mapping the objects, and if you really need to optimize something, try this version with spans in binaries in the database and don't make up your mind before running a benchmark.

Want the code? Come here.

PS(s):

  1. The test environment used a local database connection to minimize network time, and had these settings: BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1848/22H2/2022Update/SunValley2) Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores .NET SDK=7.0.300-preview.23122.5 [Host] : .NET 6.0.14 (6.0.1423.7309), X64 RyuJIT AVX2 DefaultJob : .NET 6.0.14 (6.0.1423.7309), X64 RyuJIT AVX2 Microsoft SQL Server 2022 (RTM-GDR) (KB5021522) - 16.0.1050.5 (X64)
  2. I'm not really lazy, the fictional character in the article is the lazy one. Dapper's performance cases have already been extensively tested, what I really wanted to show was this workaround with spans in the database.
  3. It wasn't any AI that gave me this idea to use spans, I thought of it myself :)

Top comments (0)