DEV Community

James Turner
James Turner

Posted on • Originally published at turnerj.com

The pain points of C# source generators

I've recently completed my first foray into writing a C# source generator for Schema.NET. There is a lot to like about source generators however there are a few things I wish I understood more before diving into it.

For those that are unaware, source generators are a new feature added to C# whereby one can analyse existing source code and generate new source code all from C# itself. One area where this is of interest is serialization - being able to generate an ideal serializer at compile time prevents the need of using reflection at runtime.

In Schema.NET, we had hundreds of classes and interfaces that mapped to Schema.org types. While we had our own tool to generate these, the generated files sat in our Git repository creating a lot of noise when trying to change our tooling behaviour. Source generators would allow us to remove these files and have them exist only as part of the compiled binary. The move to source generators was also a good time to refactor the generating logic itself, making it easier to add new features later.

Pain Point 1: Debugging Source Generators

Honestly I expected the debugging process to be:

  • Put a breakpoint in the source generator code
  • Press the "Debug" button in Visual Studio
  • Code stops at the breakpoint

Unfortunately, it isn't that simple. The source generator runs during compilation however the debugging experience starts after meaning our break point would never be hit. After some research, it seems there are two different methods suggested.

Invoke the debugger from the source generator

Found this solution from Nick's .NET Travels. Inside our source generator, likely in the "Initialize" method, we can invoke the debugger to attach to the current process with the following:

#if DEBUG
if (!Debugger.IsAttached)
{
    Debugger.Launch();
}
#endif
Enter fullscreen mode Exit fullscreen mode

What we are doing here is using the preprocessor directive #if to conditionally include this code if the build configuration is "Debug". When we are in the "Debug" configuration, we check if the debugger is already attached and if not, attach it via Debugger.Launch(). After the debugger launches, it comes up with a prompt about where to debug it (I chose a new instance of Visual Studio). From here, the code will be paused on the Debugger.Launch() line and this new instance of Visual Studio will listen for any breakpoints you may add.

I probably spent a good few hours using this method and while it works, it is not a great experience. For starters, the prompt I mention, it was appearing multiple times during a debugging session. I'm not sure if the issue related to different target frameworks building simultaneously or maybe some timeout logic being handled by the build process. Additionally I had Visual Studio crash a few times in either instance of Visual Studio I had open.

Don't take my word for it, others have had similar difficulties.

Run the source generator manually

A source generator itself is effectively like any other class - we can instantiate and call the initialization methods ourselves.
There is a detailed document in the Roslyn repo that covers all sorts of things with regards to source generators. One of the sections specifically covers testing source generators.

Here is a modified version of their example that shows the general gist:

Compilation inputCompilation = CreateCompilation(@"
namespace MyCode
{
    public class Program
    {
        public static void Main(string[] args)
        {
        }
    }
}
");

CustomGenerator generator = new CustomGenerator();

// Create the driver that will control the generation, passing in our generator
GeneratorDriver driver = CSharpGeneratorDriver.Create(generator);

// Run the generation pass
driver.RunGeneratorsAndUpdateCompilation(inputCompilation, out var outputCompilation, out var diagnostics);

static Compilation CreateCompilation(string source)
    => CSharpCompilation.Create("compilation",
        new[] { CSharpSyntaxTree.ParseText(source) },
        new[] { MetadataReference.CreateFromFile(typeof(Binder).GetTypeInfo().Assembly.Location) },
        new CSharpCompilationOptions(OutputKind.ConsoleApplication));
Enter fullscreen mode Exit fullscreen mode

Basically this creates a compilation that the source generator can run against. This method can be quite verbose as, depending on your source generator itself, you may require a lot of boilerplate source code for your generator to work upon.

In my case with Schema.NET, I'm generating hundreds of classes based on some JSON so I have minimal boilerplate. I could have gone this route however I decided on a more direct approach:

var generator = new SchemaSourceGenerator();
generator.Initialize(new Microsoft.CodeAnalysis.GeneratorInitializationContext());
generator.Execute(new Microsoft.CodeAnalysis.GeneratorExecutionContext());
Enter fullscreen mode Exit fullscreen mode

My generator didn't care about any existing syntax tree - its job was to just pump out new classes and interfaces.
This method does have a bit of a fatal flaw in that calling most (any?) of the methods on GeneratorInitializationContext or GeneratorExecutionContext may fail. These types are not instantiated with their different properties correctly configured which is something that more verbose way above did. For my SchemaSourceGenerator, I needed to comment out context.AddSource(sourceName, sourceText) so it wouldn't throw an exception.

My recommendation is for anyone working on a source generator, either have a separate console application to debug your source generator or create a special unit test. Do it properly though and have the more verbose compilation code as shown in the earlier example so you don't need to modify your source generator to run it.

Pain Point 2: No Async/Await

The methods exposed by source generators (Initialize and Execute) do not return tasks so you can't invoke async APIs.
According to the Roslyn team this is by design as the IO for reading/writing files is handled by the compiler.

For Schema.NET, we do a HTTP request to get the JSON we need to build. There are reasons this isn't a good idea but this is what we do and it works well for us. The HttpClient has only had async APIs for a long while and while that is changing, source generators must target .NET Standard 2.0 so we can't leverage that change.

My first iteration of getting the source generator to work was effectively wrapping my code in a Task.Run() call:

public void Initialize(GeneratorInitializationContext context) => Task.Run(async () =>
{
    ...

    SchemaObjects = await schemaService.GetObjectsAsync();
}).GetAwaiter().GetResult();
Enter fullscreen mode Exit fullscreen mode

This admittedly did work but I really didn't like it - it felt like such a kludge solution. There is a lot of information available about when and where you should be using Task.Run() - Stephen Cleary has a good blog post or two about it.
While a source generator is likely a new special case where it depends, I still decided to change it. I ended up with calling .GetAwaiter().GetResult() directly on the method of mine that was async instead.

public void Initialize(GeneratorInitializationContext context)
{
    ...

    SchemaObjects = schemaService.GetObjectsAsync().GetAwaiter().GetResult();
}
Enter fullscreen mode Exit fullscreen mode

I'll be honest - I don't know if this is technically better in this scenario but I know it works.

Pain Point 3: Transient Dependencies

An issue with dependencies was something I wasn't expecting at all when I started with my source generator - why should it be?
Every other library and application I've written in C# in the last few years follows a fairly predictable pattern of using a <PackageReference> to define which package and version. The basics of including a package reference like that for source generators is still the same, it is just all the other bits it now also requires.

For Schema.NET, our source generator was parsing JSON so we needed a serializer. We were previously using Newtonsoft.Json for our tool however in this refactor, we were also moving to using System.Text.Json for the parsing of the initial schema data from Schema.org. This dependency needs to only exist for the generator, not the library the generator is creating classes etc for. Normally you can just specify PrivateAssets="all" on the package reference and that's it but for source generators, you need to specify a few more things:

<ItemGroup>
    <PackageReference Include="System.Text.Json" Version="5.0.1" GeneratePathProperty="true" PrivateAssets="all" />
</ItemGroup>

<PropertyGroup>
    <GetTargetPathDependsOn>$(GetTargetPathDependsOn);GetDependencyTargetPaths</GetTargetPathDependsOn>
</PropertyGroup>

<Target Name="GetDependencyTargetPaths">
    <ItemGroup>
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Text_Json)\lib\netstandard2.0\System.Text.Json.dll" IncludeRuntimeDependency="false" />
    </ItemGroup>
</Target>
Enter fullscreen mode Exit fullscreen mode

Not too bad right? Well, what if I told you that you needed to do this for all dependencies. By that I mean every dependency in the dependency tree which for us was:

  • Microsoft.Bcl.AsyncInterfaces, 5.0.0
  • System.Buffers, 4.5.1
  • System.Memory, 4.5.4
    • System.Numerics.Vectors, 4.4.0
  • System.Numerics.Vectors, 4.5.0
  • System.Runtime.CompilerServices.Unsafe, 5.0.0
  • System.Text.Encodings.Web, 5.0.0
  • System.Threading.Tasks.Extensions, 4.5.4

Our example would look more like:

<ItemGroup>
    <PackageReference Include="System.Text.Json" Version="5.0.1" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="Microsoft.Bcl.AsyncInterfaces" Version="5.0.0" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Runtime.CompilerServices.Unsafe" Version="5.0.0" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Threading.Tasks.Extensions" Version="4.5.4" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Text.Encodings.Web" Version="5.0.1" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Buffers" Version="4.5.1" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Memory" Version="4.5.4" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Numerics.Vectors" Version="4.4.0" GeneratePathProperty="true" PrivateAssets="all" />
</ItemGroup>

<PropertyGroup>
    <GetTargetPathDependsOn>$(GetTargetPathDependsOn);GetDependencyTargetPaths</GetTargetPathDependsOn>
</PropertyGroup>

<Target Name="GetDependencyTargetPaths">
    <ItemGroup>
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Text_Json)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGMicrosoft_Bcl_AsyncInterfaces)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Runtime_CompilerServices_Unsafe)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Threading_Tasks_Extensions)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Buffers)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Memory)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Numerics_Vectors)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Text_Encodings_Web)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
    </ItemGroup>
</Target>
Enter fullscreen mode Exit fullscreen mode

If any of these dependencies pick up any new dependencies themselves, they need to be included too - this can happen with patch version changes like between System.Text.Encodings.Web going from 5.0.0 to 5.0.1 where it picked up a few new dependencies.

Currently for Schema.NET, I'm only specifying System.Text.Json and System.Text.Encodings.Web directly which allows our builds to work on our CI but Visual Studio complains during the build.
I raised an issue with the Roslyn team about this extra weird behaviour though it seems to amount for a difference between builds triggered by .NET Framework (Visual Studio and MSBuild) and .NET Core (dotnet build).

My biggest gripe here though is: Why doesn't the compiler just do this for us?

The compiler knows all our dependencies so with some sort of flag to indicate that this is a source generator, the compiler should do all this work for us. The burden to make sure we keep track of all transient dependencies when any dependency gets an update is something I don't want to do.

Potential Transient Dependency Workaround

While not a perfect solution, if you are like me and really don't like specifying every package reference in the dependency tree like that, you can automate it somewhat with a custom MSBuild target.

<ItemGroup>
    <PackageReference Include="System.Text.Json" Version="5.0.1" PrivateAssets="all" />
</ItemGroup>

<PropertyGroup>
    <GetTargetPathDependsOn>$(GetTargetPathDependsOn);GetDependencyTargetPaths</GetTargetPathDependsOn>
</PropertyGroup>

<Target Name="GetDependencyTargetPaths" AfterTargets="ResolvePackageDependenciesForBuild">
    <ItemGroup>
        <TargetPathWithTargetPlatformMoniker Include="@(ResolvedCompileFileDefinitions)" IncludeRuntimeDependency="false" />
    </ItemGroup>
</Target>
Enter fullscreen mode Exit fullscreen mode

This "works" in the sense that ResolveCompileFileDefinitions does contain a list of our transient dependencies so everything that needs to be passed in is passed in. The problem with this solution is that ResolveCompileFileDefinitions contains more than the specific dependencies we are wanting and could have undesired behaviour.

Ideally I'd like something like this to be an automatic target for source generator projects but perfected to target only private dependencies so they are bundled correctly.

Conclusion: Was migrating to source generators worth it?

Yes.

Switching to source generators, combined with my refactor, added 700 lines of code while removing 69,203 lines of code. My pull request affected 765 files, the vast majority being generated classes and interfaces that no longer need to sit in the repository.

The refactor of our generation code also sets us up nicely for the future where we can support pending Schema.org types (something that has been requested by a few people).

While these pain points are annoying, source generators are a great feature that I hope getting tool updates to improve the developer experience.

Top comments (6)

Collapse
 
arthus15 profile image
Jairo Blanco

Wow! really amazing to read I did not know about Source Generators after I read this article and Microsoft one. I didn't see anything about performance, apart from the files and lines of code removals, do you see any performance improvements? Also why are you using Source generators in your case and when do you think are the useful? Last and not least how the hell did you get all that knowledge? I would love to get more into this kind of "forgotten" tools.

Regards!

Collapse
 
turnerj profile image
James Turner

Thanks! I haven't done any major checks to performance from using source generators though I could speculate to some IDE performance improvements. IntelliSense wouldn't need to run on all these files as they no longer sit in the repo. Also, performance with Git would be a little better with less files (especially on GitHub where browsing commits that modified these files previously was a big pain).

We switched to source generators for Schema.NET as our previous tooling was a dedicated console application that manually wrote files that stored them in out Git repo. Loading the solution in Visual Studio would take a bit longer from all these extra files it had to process. The code itself behind our tooling that generated the few hundred classes was also in need of a good refactor to set us up for future features. Ultimately we were generating sources that we didn't want in our repo anymore and source generators just fit the bill for what we wanted.

Source generators are probably most useful for when they can accelerate a previously tedious task during development - basically we are getting code to write code. One example that looks really interesting is generating serializers at compile time to improve performance. Normally a serialization system would need to use reflection to generate a serializer for a particular type. Instead with source generators, you could have the serializers generated as part of the compilation step.

As for how I learnt all of this - some is keeping up-to-date with new features coming to C# and .NET (normally from other devs on Twitter) but most is just experimenting and getting more experienced with different aspects that interest me. This was my first time using source generators and I did it for making Schema.NET work better. I work on Schema.NET because it can help my business work better. Basically I'm trying to solve problems down the stack I depend on.

Hope that helps!

Collapse
 
arthus15 profile image
Jairo Blanco

Hello,

Thanks for your answer it really helps me to understand better :D!

I will keep an eye on your future posts!

Regards!

Collapse
 
zarnish234 profile image
zarnish khan

When diving into the world of C# source generators, developers often encounter certain pain points that can hinder their productivity and efficiency. One significant challenge revolves around the complexity of understanding and implementing the intricate logic required for generating code dynamically. This can lead to steep learning curves and time-consuming debugging processes, particularly for developers who are new to source generator technology. Additionally, ensuring compatibility and seamless integration with existing codebases can pose another obstacle, requiring careful planning and thorough testing. Despite these challenges, leveraging the power of source generators can significantly enhance code generation and optimization workflows, ultimately streamlining development processes and improving overall project efficiency. With the right tools and guidance, developers can overcome these pain points and unlock the full potential of C# source generators. And for reliable power to fuel your coding endeavors, trust in the Champion generator to keep your development environment running smoothly, empowering you to conquer any coding challenge with confidence.

Collapse
 
snielsson profile image
Stig Schmidt Nielsson

Hi James - great post !. It is from 2021 and I am about to write my first generator, so I am wondering if any of the pain points been addressed in .Net7 / C#11 ?

Collapse
 
turnerj profile image
James Turner

Hey - I did do an update on my personal site but haven't syndicated it to DEV back in February 2022: turnerj.com/blog/csharp-source-gen...

There has likely been some more improvements since then but I haven't done a lot of source generator work recently to track any further updates.