Functional programming in C#

#csharp #dotnet #functional

Functional programming relies on pure functions, which have no side-effects and always return the same output for a given input. This paradigm has many benefits, but can be difficult to achieve in C#, especially for people who are accustomed to writing imperative code. If, for example, you find yourself writing or using methods that return void, then your code is probably not functional. This can frequently happen when building complex data structures.

Using JSON to write a report

Let's say we've stored customer orders in a JSON file called Orders.json, like this:

[
  {
    "OrderID": 10248,
    "OrderDate": "1996-07-04T00:00:00",
    "ShipCountry": "France"
  },
  {
    "OrderID": 10249,
    "OrderDate": "1996-07-05T00:00:00",
    "ShipCountry": "Germany"
  },
  ...
]

We can read this file into memory using an Order class:

class Order
{
    public int OrderID { get; set; }
    public DateTime OrderDate { get; set; }
    public string ShipCountry { get; set; }

    public static Order[] GetOrders()
    {
        var json = File.ReadAllText("Orders.json");
        return JsonSerializer.Deserialize<Order[]>(json);
    }
}

GetOrders converts the contents of a JSON file into an array of Orders. Note that it is not a pure function, since the orders it returns depend on the contents of Orders.json (which can change over time).

Our goal is to create "reports", where a report gathers order IDs by country. A report has the following type:¹

Dictionary<string /*country*/, List<int> /*order IDs*/>

In particular, we want to write a function that takes a list of years as input, and returns a list of reports - one report for each of the given years.

Imperative version

The traditional way to implement such a function in C# is with nested loops:

static List<Dictionary<string, List<int>>> GetReports(IList<int> years)
{
    var dicts = new List<Dictionary<string, List<int>>>();
    foreach (var year in years)
    {
        var dict = new Dictionary<string, List<int>>();
        foreach (var order in Order.GetOrders())
        {
            if (order.OrderDate.Year == year)
            {
                if (!dict.TryGetValue(order.ShipCountry, out List<int> orderIDs))
                {
                    dict[order.ShipCountry] = orderIDs = new List<int>();
                }
                orderIDs.Add(order.OrderID);
            }
        }
        dicts.Add(dict);
    }
    return dicts;
}

This approach does work, but is prone to error because we're building each report one order at a time, which makes it easy to get something wrong. The calls to List.Add and Dictionary.Item (i.e. setting the value of a key using square braces) return void - they're not pure functions. TryGetValue is also notoriously tricky in C# because its horrible signature can leave a null value in the out parameter that must be handled carefully.

Note that this version of GetReports isn't pure itself, either. Because it calls Order.GetOrders() directly, its behavior also depends on the contents of Orders.json.

Functional version

Fortunately, we can rewrite this so that it uses only pure functions, and is also a pure function itself. LINQ is a great example of a functional programming API, so let's use it to create our reports:

static List<Dictionary<string, List<int>>> GetReports(IList<Order> orders, IList<int> years)
    => years
        .Select(year =>
            orders
                .Where(order => order.OrderDate.Year == year)
                .GroupBy(
                    order => order.ShipCountry,
                    order => order.OrderID)
                .ToDictionary(
                    group => group.Key,
                    group => group.ToList()))
        .ToList();

This version is a big improvement because we're no longer building reports one order at a time. Instead, we can think about data flow, which is a much higher level of abstraction. For each year, we take a stream of orders, filter out the ones we don't want using Where, group them by country using GroupBy, and then convert the resulting stream into a dictionary with ToDictionary. The control flow is trivial, since it no longer contains nested loops or if statements.

Note that we've also added an explicit orders parameter to the function, so it is now guaranteed to always return the same reports for a given set of orders and years. This makes it a pure function, which we can test with code like this:

var years = new int[] { 1996, 1997, 1998 };
foreach (var dict in GetReports(Order.GetOrders(), years))
{
    Console.WriteLine();
    foreach (var pair in dict)
    {
        Console.WriteLine($"{pair.Key}: {pair.Value.Count}");
    }
}

This approach also makes it easier to optimize the implementation by grouping the orders by year at the start, so it's not necessary to re-iterate all the orders for each year. Anyone want to give that a try in the comments?

One of the major missing features in C# is typedefs, so there's no easy way to create an abbreviation for this type. C# does have using aliases, but they're a poor substitute. sigh ↩