Introduction
Code generation is a very interesting topic. Instead of just writing code you can write code to write code. You can do code generation at compile time (new fancy source generators) and at runtime (expressions, emit IL). Anyway the idea to create methods and classes at runtime sounds like a magic to me. Runtime code generation feature is used quite heavily under the hood of DI frameworks, ORMs, different types of object mappers etc. Now when I have a sufficient understanding of the topic I realized that in the past I had some tasks which could be solved in more efficient and elegant way by using code generation. Unfortunately during those times I knew nothing about it. Searching the internet gave me results with quite high entry threshold and they didn't give an entire understanding of the feature. Most of examples in articles are quite trivial so it's still unclear how to apply it in practice. Here as the first step I want to describe a particular problem which could be solved with metaprogramming and then to give an overview of different code generation approaches. There will be a lot of code.
Task description
Let's imagine our application receives a data from some source as an array of strings (for simplicity only string, integer and datetime values are expected in an input array):
["John McClane", "1994-11-05T13:15:30", "4455"]
I need a generic way to parse this input into the instance of a particular class. This is an interface to create a parser delegate (i.e. it accepts an array of strings as the input and returns an instance of T
as the output):
public interface IParserFactory
{
Func<string[], T> GetParser<T>() where T : new();
}
I use ParserOutputAttribute
to identify classes used as parser's output. And I use ArrayIndexAttribute
to understand which property corresponds to each of the array elements:
[ParserOutput]
public class Data
{
[ArrayIndex(0)] public string Name { get; set; } // will be "John McClane"
[ArrayIndex(2)] public int Number { get; set; } // will be 4455
[ArrayIndex(1)] public DateTime Birthday { get; set; } // will be 1994-11-05T13:15:30
}
If array element can't be parsed to the target type then it's ignored.
So as a general idea I don't want to limit implementation by Data
class only. I want to produce a parser delegate for any type with the proper attributes.
Plain C#
First of all I want to write a plain C# code without code generation or reflection at all for a known type:
var data = new Data();
if (0 < inputArray.Length)
{
data.Name = inputArray[0];
}
if (1 < inputArray.Length && DateTime.TryParse(inputArray[1], out var bd))
{
data.Birthday = bd;
}
if (2 < inputArray.Length && int.TryParse(inputArray[2], out var n))
{
data.Number = n;
}
return data;
Quite simple, right? But now I want to generate the same code for an arbitrary type at runtime or compile time. Let's go!
Reflection
In the first approach with reflection I'm not going to generate a parser delegate. Instead I'm going to create an instance of the target type and set its properties using reflection API.
public class ReflectionParserFactory : IParserFactory
{
public Func<string[], T> GetParser<T>() where T : new()
{
return ArrayIndexParse<T>;
}
private static T ArrayIndexParse<T>(string[] data) where T : new()
{
// create a new instance of target type
var instance = new T();
var props = typeof(T).GetProperties(BindingFlags.Instance | BindingFlags.Public);
//go through all public and non-static properties
//read and parse corresponding element in array and if success - set property value
for (int i = 0; i < props.Length; i++)
{
var attrs = props[i].GetCustomAttributes(typeof(ArrayIndexAttribute)).ToArray();
if (attrs.Length == 0) continue;
int order = ((ArrayIndexAttribute)attrs[0]).Order;
if (order < 0 || order >= data.Length) continue;
if (props[i].PropertyType == typeof(string))
{
props[i].SetValue(instance, data[order]);
continue;
}
if (props[i].PropertyType == typeof(int))
{
if (int.TryParse(data[order], out var intResult))
{
props[i].SetValue(instance, intResult);
}
continue;
}
if (props[i].PropertyType == typeof(DateTime))
{
if (DateTime.TryParse(data[order], out var dtResult))
{
props[i].SetValue(instance, dtResult);
}
}
}
return instance;
}
}
It works and it's quite readable. But it's slow (check benchmarks section below too). If you want to call this code very often it could be an issue. I want to implement something more sophisticated using real code generation.
Code generation
Expression trees
From the official documentation:
Expression trees represent code in a tree-like data structure, where each node is an expression, for example, a method call or a binary operation such as x < y. You can compile and run code represented by expression trees.
Expression trees give primitive building blocks like Expression.Call
to call a method, Expression.Loop
to add some repeating logic etc. Then using these blocks we build a parser delegate as a tree
of instructions and finally compile it into the delegate at runtime.
public class ExpressionTreeParserFactory : IParserFactory
{
public Func<string[], T> GetParser<T>() where T : new()
{
var props = typeof(T).GetProperties(BindingFlags.Instance | BindingFlags.Public);
//declare an input parameter of the delegate
ParameterExpression inputArray = Expression.Parameter(typeof(string[]), "inputArray");
//declare an output parameter of the delegate
ParameterExpression instance = Expression.Variable(typeof(T), "instance");
//create a new instance of target type
var block = new List<Expression>
{
Expression.Assign(instance, Expression.New(typeof(T).GetConstructors()[0]))
};
var variables = new List<ParameterExpression> {instance};
//go through all public and non-static properties
foreach (var prop in props)
{
var attrs = prop.GetCustomAttributes(typeof(ArrayIndexAttribute)).ToArray();
if (attrs.Length == 0) continue;
int order = ((ArrayIndexAttribute)attrs[0]).Order;
if (order < 0) continue;
//validate an index from ArrayIndexAttribute
var orderConst = Expression.Constant(order);
var orderCheck = Expression.LessThan(orderConst, Expression.ArrayLength(inputArray));
if (prop.PropertyType == typeof(string))
{
//set string property
var stringPropertySet = Expression.Assign(
Expression.Property(instance, prop),
Expression.ArrayIndex(inputArray, orderConst));
block.Add(Expression.IfThen(orderCheck, stringPropertySet));
continue;
}
//get parser method from the list of available parsers (currently we parse only Int and DateTime)
if (!TypeParsers.Parsers.TryGetValue(prop.PropertyType, out var parser))
{
continue;
}
var parseResult = Expression.Variable(prop.PropertyType, "parseResult");
var parserCall = Expression.Call(parser, Expression.ArrayIndex(inputArray, orderConst), parseResult);
var propertySet = Expression.Assign(
Expression.Property(instance, prop),
parseResult);
//set property if an element of array is successfully parsed
var ifSet = Expression.IfThen(parserCall, propertySet);
block.Add(Expression.IfThen(orderCheck, ifSet));
variables.Add(parseResult);
}
block.Add(instance);
//compile lambda expression into delegate
return Expression.Lambda<Func<string[], T>>(
Expression.Block(variables.ToArray(), Expression.Block(block)),
inputArray).Compile();
}
}
Emit IL
Dotnet compiler transforms your C# code into intermediate language (CIL or just IL) and then dotnet runtime translates IL into machine instructions. For instance, using sharplab.io you can easily check how generated IL will look like:
Here we are going to write ("emit") IL instructions directly and then compile them into the delegate at runtime.
public class EmitIlParserFactory : IParserFactory
{
public Func<string[], T> GetParser<T>() where T : new()
{
var props = typeof(T).GetProperties(BindingFlags.Instance | BindingFlags.Public);
var dm = new DynamicMethod($"from_{typeof(string[]).FullName}_to_{typeof(T).FullName}",
typeof(T), new [] { typeof(string[]) }, typeof(EmitIlParserFactory).Module);
var il = dm.GetILGenerator();
//create a new instance of target type
var instance = il.DeclareLocal(typeof(T));
il.Emit(OpCodes.Newobj, typeof(T).GetConstructors()[0]);
il.Emit(OpCodes.Stloc, instance);
//go through all public and non-static properties
foreach (var prop in props)
{
var attrs = prop.GetCustomAttributes(typeof(ArrayIndexAttribute)).ToArray();
if (attrs.Length == 0) continue;
int order = ((ArrayIndexAttribute)attrs[0]).Order;
if (order < 0) continue;
var label = il.DefineLabel();
if (prop.PropertyType == typeof(string))
{
//check whether order from ArrayIndexAttribute is a valid index of the input array
il.Emit(OpCodes.Ldc_I4, order);
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldlen);
il.Emit(OpCodes.Bge_S, label);
//set string property
il.Emit(OpCodes.Ldloc, instance);
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, order);
il.Emit(OpCodes.Ldelem_Ref);
il.Emit(OpCodes.Callvirt, prop.GetSetMethod());
il.MarkLabel(label);
continue;
}
//get parser method from the list of available parsers (currently we parse only Int and DateTime)
if (!TypeParsers.Parsers.TryGetValue(prop.PropertyType, out var parser))
{
continue;
}
//check whether order from ArrayIndexAttribute is a valid index of the input array
il.Emit(OpCodes.Ldc_I4, order);
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldlen);
il.Emit(OpCodes.Bge_S, label);
var parseResult = il.DeclareLocal(prop.PropertyType);
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, order);
il.Emit(OpCodes.Ldelem_Ref);
il.Emit(OpCodes.Ldloca, parseResult);
il.EmitCall(OpCodes.Call, parser, null);
il.Emit(OpCodes.Brfalse_S, label);
//set property if an element of array is successfully parsed
il.Emit(OpCodes.Ldloc, instance);
il.Emit(OpCodes.Ldloc, parseResult);
il.Emit(OpCodes.Callvirt, prop.GetSetMethod());
il.MarkLabel(label);
}
il.Emit(OpCodes.Ldloc, instance);
il.Emit(OpCodes.Ret);
//create delegate from il instructions
return (Func<string[], T>)dm.CreateDelegate(typeof(Func<string[], T>));
}
}
Sigil
This approach is quite similar to the previous one, but now we use sigil which gives us a syntax sugar and more understandable error messages.
public class SigilParserFactory : IParserFactory
{
public Func<string[], T> GetParser<T>() where T : new()
{
var props = typeof(T).GetProperties(BindingFlags.Instance | BindingFlags.Public);
var il = Emit<Func<string[], T>>.NewDynamicMethod($"from_{typeof(string[]).FullName}_to_{typeof(T).FullName}");
var instance = il.DeclareLocal<T>();
il.NewObject<T>();
il.StoreLocal(instance);
foreach (var prop in props)
{
var attrs = prop.GetCustomAttributes(typeof(ArrayIndexAttribute)).ToArray();
if (attrs.Length == 0) continue;
int order = ((ArrayIndexAttribute)attrs[0]).Order;
if (order < 0) continue;
var label = il.DefineLabel();
if (prop.PropertyType == typeof(string))
{
il.LoadConstant(order);
il.LoadArgument(0);
il.LoadLength<string>();
il.BranchIfGreaterOrEqual(label);
il.LoadLocal(instance);
il.LoadArgument(0);
il.LoadConstant(order);
il.LoadElement<string>();
il.CallVirtual(prop.GetSetMethod());
il.MarkLabel(label);
continue;
}
if (!TypeParsers.Parsers.TryGetValue(prop.PropertyType, out var parser))
{
continue;
}
il.LoadConstant(order);
il.LoadArgument(0);
il.LoadLength<string>();
il.BranchIfGreaterOrEqual(label);
var parseResult = il.DeclareLocal(prop.PropertyType);
il.LoadArgument(0);
il.LoadConstant(order);
il.LoadElement<string>();
il.LoadLocalAddress(parseResult);
il.Call(parser);
il.BranchIfFalse(label);
il.LoadLocal(instance);
il.LoadLocal(parseResult);
il.CallVirtual(prop.GetSetMethod());
il.MarkLabel(label);
}
il.LoadLocal(instance);
il.Return();
return il.CreateDelegate();
}
}
Cache compiled parsers
We have implemented three approaches to create a parser delegate: expression tree, emit IL and sigil. In all cases we have the same problem: IParserFactory.GetParser
does a hard job (builiding an expression tree or emitting IL and then creating delegate) every time you call it. Solution is quite simple - just cache it:
public class CachedParserFactory : IParserFactory
{
private readonly IParserFactory _realParserFactory;
private readonly ConcurrentDictionary<string, Lazy<object>> _cache;
public CachedParserFactory(IParserFactory realParserFactory)
{
_realParserFactory = realParserFactory;
_cache = new ConcurrentDictionary<string, Lazy<object>>();
}
public Func<string[], T> GetParser<T>() where T : new()
{
return (Func<string[], T>)(_cache.GetOrAdd($"aip_{_realParserFactory.GetType().FullName}_{typeof(T).FullName}",
new Lazy<object>(() => _realParserFactory.GetParser<T>(), LazyThreadSafetyMode.ExecutionAndPublication)).Value);
}
}
Now we reuse compiled versions of delegates which is more efficient.
Roslyn based approaches
Roslyn is a dotnet compiler platform which doesn't only compile code but gives an ability to do syntax analysis and to generate code.
Roslyn runtime code generation
Roslyn approach is quite interesting because it gives an ability to write plain C# (as a string though) instead of writing IL instructions or combining expression tree blocks:
public static class RoslynParserInitializer
{
public static IParserFactory CreateFactory()
{
//get all types marked with ParserOutputAttribute
var targetTypes =
(from a in AppDomain.CurrentDomain.GetAssemblies()
from t in a.GetTypes()
let attributes = t.GetCustomAttributes(typeof(ParserOutputAttribute), true)
where attributes != null && attributes.Length > 0
select t).ToArray();
var typeNames = new List<(string TargetTypeName, string TargetTypeFullName, string TargetTypeParserName)>();
var builder = new StringBuilder();
builder.AppendLine(@"
using System;
using Parsers.Common;
public class RoslynGeneratedParserFactory : IParserFactory
{");
//go through all types
foreach (var targetType in targetTypes)
{
var targetTypeName = targetType.Name;
var targetTypeFullName = targetType.FullName;
var targetTypeParserName = targetTypeName + "Parser";
typeNames.Add((targetTypeName, targetTypeFullName, targetTypeParserName));
//generate private parser method for each target type
builder.AppendLine($"private static T {targetTypeParserName}<T>(string[] input)");
builder.Append($@"
{{
var {targetTypeName}Instance = new {targetTypeFullName}();");
var props = targetType.GetProperties(BindingFlags.Instance | BindingFlags.Public);
//go through all properties of the target type
foreach (var prop in props)
{
var attrs = prop.GetCustomAttributes(typeof(ArrayIndexAttribute)).ToArray();
if (attrs.Length == 0) continue;
int order = ((ArrayIndexAttribute)attrs[0]).Order;
if (order < 0) continue;
if (prop.PropertyType == typeof(string))
{
builder.Append($@"
if({order} < input.Length)
{{
{targetTypeName}Instance.{prop.Name} = input[{order}];
}}
");
}
if (prop.PropertyType == typeof(int))
{
builder.Append($@"
if({order} < input.Length && int.TryParse(input[{order}], out var parsed{prop.Name}))
{{
{targetTypeName}Instance.{prop.Name} = parsed{prop.Name};
}}
");
}
if (prop.PropertyType == typeof(DateTime))
{
builder.Append($@"
if({order} < input.Length && DateTime.TryParse(input[{order}], out var parsed{prop.Name}))
{{
{targetTypeName}Instance.{prop.Name} = parsed{prop.Name};
}}
");
}
}
builder.Append($@"
object obj = {targetTypeName}Instance;
return (T)obj;
}}");
}
builder.AppendLine("public Func<string[], T> GetParser<T>() where T : new() {");
foreach (var typeName in typeNames)
{
builder.Append($@"
if (typeof(T) == typeof({typeName.TargetTypeFullName}))
{{
return {typeName.TargetTypeParserName}<T>;
}}
");
}
builder.AppendLine("throw new NotSupportedException();}");
builder.AppendLine("}");
var syntaxTree = CSharpSyntaxTree.ParseText(builder.ToString());
//reference assemblies
string assemblyName = Path.GetRandomFileName();
var refPaths = new List<string> {
typeof(Object).GetTypeInfo().Assembly.Location,
typeof(Enumerable).GetTypeInfo().Assembly.Location,
Path.Combine(Path.GetDirectoryName(typeof(GCSettings).GetTypeInfo().Assembly.Location), "System.Runtime.dll"),
typeof(RoslynParserInitializer).GetTypeInfo().Assembly.Location,
typeof(IParserFactory).GetTypeInfo().Assembly.Location,
Path.Combine(Path.GetDirectoryName(typeof(GCSettings).GetTypeInfo().Assembly.Location), "netstandard.dll"),
};
refPaths.AddRange(targetTypes.Select(x => x.Assembly.Location));
var references = refPaths.Select(r => MetadataReference.CreateFromFile(r)).ToArray();
// compile dynamic code
var compilation = CSharpCompilation.Create(
assemblyName,
syntaxTrees: new[] { syntaxTree },
references: references,
options: new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary));
//compile assembly
using (var ms = new MemoryStream())
{
var result = compilation.Emit(ms);
//to get a proper errors
if (!result.Success)
{
throw new Exception(string.Join(",", result.Diagnostics.Where(diagnostic =>
diagnostic.IsWarningAsError ||
diagnostic.Severity == DiagnosticSeverity.Error).Select(x => x.GetMessage())));
}
ms.Seek(0, SeekOrigin.Begin);
// load assembly from memory
var assembly = AssemblyLoadContext.Default.LoadFromStream(ms);
var factoryType = assembly.GetType("RoslynGeneratedParserFactory");
if (factoryType == null) throw new NullReferenceException("Roslyn generated parser type not found");
//create an instance of freshly generated parser factory
return (IParserFactory)Activator.CreateInstance(factoryType);
}
}
}
Source generator
- Overview of source generators from the official documentation
Source generator gives a very interesting ability of building parser's delegate during the compilation step, i.e. in advance. So in that case we don't have any runtime overhead to build a parser delegate at the first time which is amazing:
[Generator]
public class ParserSourceGenerator : ISourceGenerator
{
public void Initialize(GeneratorInitializationContext context)
{
//uncomment to debug
//System.Diagnostics.Debugger.Launch();
}
public void Execute(GeneratorExecutionContext context)
{
var compilation = context.Compilation;
var parserOutputTypeSymbol = compilation.GetTypeByMetadataName("Parsers.Common.ParserOutputAttribute");
var attributeIndexTypeSymbol = compilation.GetTypeByMetadataName("Parsers.Common.ArrayIndexAttribute");
var typesToParse = new List<ITypeSymbol>();
foreach (var syntaxTree in compilation.SyntaxTrees)
{
var semanticModel = compilation.GetSemanticModel(syntaxTree);
//get all types marked with ParserOutputAttribute
typesToParse.AddRange(syntaxTree.GetRoot()
.DescendantNodesAndSelf()
.OfType<ClassDeclarationSyntax>()
.Select(x => semanticModel.GetDeclaredSymbol(x))
.OfType<ITypeSymbol>()
.Where(x => x.GetAttributes().Select(a => a.AttributeClass)
.Any(b => b == parserOutputTypeSymbol)));
}
var typeNames = new List<(string TargetTypeName, string TargetTypeFullName, string TargetTypeParserName)>();
var builder = new StringBuilder();
builder.AppendLine(@"
using System;
using Parsers.Common;
namespace BySourceGenerator
{
public class Parser : IParserFactory
{");
//go through all types
foreach (var typeSymbol in typesToParse)
{
var targetTypeName = typeSymbol.Name;
var targetTypeFullName = GetFullName(typeSymbol);
var targetTypeParserName = targetTypeName + "Parser";
typeNames.Add((targetTypeName, targetTypeFullName, targetTypeParserName));
builder.AppendLine($"private static T {targetTypeParserName}<T>(string[] input)");
builder.Append($@"
{{
var {targetTypeName}Instance = new {targetTypeFullName}();");
var props = typeSymbol.GetMembers().OfType<IPropertySymbol>();
//go through all properties of the target type
foreach (var prop in props)
{
var attr = prop.GetAttributes().FirstOrDefault(x => x.AttributeClass == attributeIndexTypeSymbol);
if (attr == null || !(attr.ConstructorArguments[0].Value is int)) continue;
int order = (int) attr.ConstructorArguments[0].Value;
if (order < 0) continue;
if (GetFullName(prop.Type) == "System.String")
{
builder.Append($@"
if({order} < input.Length)
{{
{targetTypeName}Instance.{prop.Name} = input[{order}];
}}
");
}
if (GetFullName(prop.Type) == "System.Int32")
{
builder.Append($@"
if({order} < input.Length && int.TryParse(input[{order}], out var parsed{prop.Name}))
{{
{targetTypeName}Instance.{prop.Name} = parsed{prop.Name};
}}
");
}
if (GetFullName(prop.Type) == "System.DateTime")
{
builder.Append($@"
if({order} < input.Length && DateTime.TryParse(input[{order}], out var parsed{prop.Name}))
{{
{targetTypeName}Instance.{prop.Name} = parsed{prop.Name};
}}
");
}
}
builder.Append($@"
object obj = {targetTypeName}Instance;
return (T)obj;
}}");
}
builder.AppendLine("public Func<string[], T> GetParser<T>() where T : new() {");
foreach (var typeName in typeNames)
{
builder.Append($@"
if (typeof(T) == typeof({typeName.TargetTypeFullName}))
{{
return {typeName.TargetTypeParserName}<T>;
}}
");
}
builder.AppendLine("throw new NotSupportedException();}");
builder.AppendLine("}}");
var src = builder.ToString();
context.AddSource(
"ParserGeneratedBySourceGenerator.cs",
SourceText.From(src, Encoding.UTF8)
);
}
private static string GetFullName(ITypeSymbol typeSymbol) =>
$"{typeSymbol.ContainingNamespace}.{typeSymbol.Name}";
}
Benchmarks
The post wouldn't be comprehensive without benchmarks. I would like to compare two things:
- warm up step, i.e. generation of parser;
- invocation of already generated parser.
Benchmarks are measured using BenchmarkDotNet. μs
- microsecond, ns
- nanosecond, 1 μs = 1000 ns.
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1237 (21H1/May2021Update)
Intel Core i7-8550U CPU 1.80GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.401
[Host] : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
DefaultJob : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
Generation of parser
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
EmitIl | 22.02 μs | 0.495 μs | 1.429 μs | 1.2817 | 0.6409 | 0.0305 | 5 KB |
ExpressionTree | 683.68 μs | 13.609 μs | 31.268 μs | 2.9297 | 0.9766 | - | 14 KB |
Sigil | 642.63 μs | 12.305 μs | 29.243 μs | 112.3047 | - | - | 460 KB |
Roslyn | 71,605.64 μs | 2,533.732 μs | 7,350.817 μs | 1000.0000 | - | - | 5,826 KB |
Invocation of parser
Method | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Allocated |
---|---|---|---|---|---|---|---|
EmitIl | 374.7 ns | 7.75 ns | 22.36 ns | 1.02 | 0.08 | 0.0095 | 40 B |
ExpressionTree | 378.1 ns | 7.56 ns | 20.57 ns | 1.03 | 0.08 | 0.0095 | 40 B |
Reflection | 13,625.0 ns | 272.60 ns | 750.81 ns | 37.29 | 2.29 | 0.7782 | 3,256 B |
Sigil | 378.9 ns | 7.69 ns | 21.06 ns | 1.03 | 0.07 | 0.0095 | 40 B |
Roslyn | 404.2 ns | 7.55 ns | 17.80 ns | 1.10 | 0.07 | 0.0095 | 40 B |
SourceGenerator | 384.4 ns | 7.79 ns | 21.46 ns | 1.05 | 0.08 | 0.0095 | 40 B |
ManuallyWritten | 367.8 ns | 7.36 ns | 15.68 ns | 1.00 | 0.00 | 0.0095 | 40 B |
All approaches besides direct usage of reflection give results almost identical to manually written C# parser.
Source code
Here is github repository with parser factories, unit tests and benchmarks.
Latest comments (2)
All that extra work just to read simple data seems overkill.
It's not a sample from a real application. At the same time it's not too simple like
Console.WriteLine("foo");
. And it's demonstrative enough.