DEV Community

Cover image for Source-generated RegEx (C#)
Karen Payne
Karen Payne

Posted on • Updated on

Source-generated RegEx (C#)

Introduction

Learn about source generated (.NET 7 and higher) regular expressions to improve performance and documentation.

  • Rather than a long article with benchmarks, this article is short without benchmarks as for learning there is plenty of information provided along with useful source code.

  • Creation is a manual process while those with Jetbrains ReSharper installed will assist with the creation process and even make useful name recommendations.

Source code (NET8)

The source project contains two classes with several helpful example to learn from.

  • Credit card masking.
  • Remove extra spaces from strings.
  • Conditional extracting parts from hyper links.
  • Uncommon date extraction from a string with multiple dates.
  • Method to increment alpha numeric strings.

Note
There is a secondary project below with source code

Note
Some code was not formatting in the last section so a link was done instead.

Performance

Rather repeat a great resource, see Regular Expression Improvements in .NET 7 by Stephen Toub - MSFT.

Implementation

To use source generated regular expressions GeneratedRegex attribute, create a static partial class.

Simple example which proper cases a string.



using System.Text.RegularExpressions;

namespace GeneratedRegexSamplesApp.Classes;
public static partial class Helpers
{
     public static string ProperCased(this string source)
        => SentenceCaseRegex()
            .Replace(source.ToLower(), s => s.Value.ToUpper());


     [GeneratedRegex(@"(^[a-z])|\.\s+(.)", RegexOptions.ExplicitCapture)]
    private static partial Regex SentenceCaseRegex();
}


Enter fullscreen mode Exit fullscreen mode

After adding the class to a project, SentenceCaseRegex() will have red squiggly below until the project is built.

Once the project is built, the source code can be viewed under Dependences ➡️ Analyzers.

Shows RegexGenator.g.cs

Documentation

Source generation has a bonus, documentation of the regular expression pattern which is helpful in two ways. First, if a developer did not write the expression pattern the XML documentation helps to explain the pattern and secondly when the expression is in a library helps developer to know if the method using a specific pattern fits their needs.

To see the documentation, hover over the implementation or the method as shown below.

Shows the explanation for the regular expression

Important even though the documentation is provided does not mean there is no need for documentation of the method using the regular expression.

Perfect example, a method to determine if a social security number is valid were the social security number is passed with dashes.

SSN validation

Hover over the method provides the following which is correct but does not explain the why.

Shows XML documentation

In this case the developer needs to explain the why as shown below to prevent fraud.



/// <summary>
/// Is a valid SSN
/// </summary>
/// <returns>True if valid, false if invalid SSN</returns>
/// <remarks>
/// 
/// Guaranteed to never be an empty string or null, client code handles this. 
/// 
/// ^                                       #Start of expression
/// (?!\b(\d)\1+-(\d)\1+-(\d)\1+\b)         #Don't allow all matching digits for every field
/// (?!123-45-6789|219-09-9999|078-05-1120) #Don't allow "123-45-6789", "219-09-9999" or "078-05-1120"
/// (?!666|000|9\d{2})\d{3}                 #Don't allow the SSN to begin with 666, 000 or anything between 900-999
/// -                                       #A dash (separating Area and Group numbers)
/// (?!00)\d{2}                             #Don't allow the Group Number to be "00"
/// -                                       #Another dash (separating Group and Serial numbers)
/// (?!0{4})\d{4}                           #Don't allow last four digits to be "0000"
/// $                                       #End of expression
/// </remarks>
public static bool IsValidSocialSecurityNumber(string value) => SSNValidationRegex().IsMatch(value.Replace("-", ""));


Enter fullscreen mode Exit fullscreen mode

Separation of GeneratedRegex

Consider placing GeneratedRegex in a new file if there are many. For instance, given a class named StringExtensions create StringExtensions.cs then create a new file named GeneratedRegularExpressions.cs and alter the class name to StringExtensions.

Example for GeneratedRegularExpressions.cs

Project code



public static partial class StringExtensions
{
    [GeneratedRegex(@"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\da-zA-Z]).{8,}$")]
    private static partial Regex PasswordRegEx();

    [GeneratedRegex(@"([A-Z][a-z]+)")]
    private static partial Regex CaseRegEx();

    [GeneratedRegex("[0-9]+$")]
    private static partial Regex NumericSuffixRegEx();

    [GeneratedRegex("[0-9][0-9 ]{13,}[0-9]")]
    private static partial Regex CreditCardMaskRegEx();
}


Enter fullscreen mode Exit fullscreen mode

Then the main code which is shown here.

Summary

Information and code samples have been provided to show how to implement source generation for RegEx (Regular Expressions) which provide better performance gains than conventional implementation of regular expressions with the bonus of XML documentation.

Top comments (1)

Collapse
 
jangelodev profile image
João Angelo

Hi Karen Payne,
Top, very nice and helpful !
Thanks for sharing.