In the previous post I described how to use the Roslyn API to find code patterns in the C# AST and how to change the AST to rewrite the original code to something else. The goal was to automate the conversion of NUnit tests to xUnit. The approach I used was quite tedious, as I had to write a very long chain or ifs and typecasts to get the job done. Let's try to do better this time. Let's start with just the search part in our search-and-replace tool.
What would be great is to be able to specify structural patterns like this:
Assert.That(_, Is.EqualTo(_))
Assert.That(_, Is.EqualTo(true))
Assert.That(_, Is.Throws.TypeOf<_>())
And they would match the actual code:
// Matched by 'Assert.That(_, Is.EqualTo(_))'
Assert.That(account.Id, Is.EqualTo(id))
Assert.That("".ToBytes(), Is.EqualTo(new byte[] {}))
// Matched by 'Assert.That(_, Is.EqualTo(true))'
Assert.That(info.IsMd5, Is.EqualTo(true));
Assert.That(token.BoolAt(path, true), Is.EqualTo(true));
// Matched by 'Assert.That(_, Is.Throws.TypeOf<_>())'
Assert.That(() => Quad[-1], Throws.TypeOf<ArgumentOutOfRangeException>())
Assert.That(() => access(token, path), Throws.TypeOf<JTokenAccessException>())
At first it looks like a quite difficult task. But as it turns out in its simple form is not even that hard. I got the idea first when I was generating code for AST replacement with Roslyn Quoter. Looking at its source code I discovered a bunch of Parse*
methods of the SyntaxFactory
class.
So basically one function call will parse the snippet and return an AST for the given pattern:
var patternAst = SyntaxFactory.ParseExpression("Assert.That(_, Is.EqualTo(_))");
The one line above is equivalent to a wall of code like this:
var patternAst =
InvocationExpression(
MemberAccessExpression(
SyntaxKind.SimpleMemberAccessExpression,
IdentifierName("Assert"),
IdentifierName("That")))
.WithArgumentList(
ArgumentList(
SeparatedList<ArgumentSyntax>(
new SyntaxNodeOrToken[]{
Argument(
IdentifierName("_")),
Token(SyntaxKind.CommaToken),
Argument(
InvocationExpression(
MemberAccessExpression(
SyntaxKind.SimpleMemberAccessExpression,
IdentifierName("Is"),
IdentifierName("EqualTo")))
.WithArgumentList(
ArgumentList(
SingletonSeparatedList<ArgumentSyntax>(
Argument(
IdentifierName("_"))))))})));
It feels like a total win already and we have not even done anything useful yet. But let's find this pattern in a source AST. First, we need to parse the file we're searching in:
var sourceAst = CSharpSyntaxTree.ParseText(File.ReadAllText(filename));
This gives us the list of all expression nodes in the AST:
var nodes = sourceAst.GetRoot().DescendantNodes().OfType<ExpressionSyntax>();
And now we find the nodes that match:
foreach (var e in nodes)
{
if (Ast.Match(e, patternAst))
{
var line = e.GetLocation().GetLineSpan().StartLinePosition.Line;
var code = e.NormalizeWhitespace();
Console.WriteLine($" {line}: {code}");
}
}
Obviously the Ast.Match
function is the tricky one. But not as tricky, really. We recursively traverse both ASTs in parallel and see if they match:
public bool Match(SyntaxNode code, SyntaxNode pattern)
{
// A placeholder matches anything
if (IsPlaceholder(pattern))
return true;
// Node types don't match. Clearly not a match.
if (code.GetType() != pattern.GetType())
return false;
switch (code)
{
case ArgumentSyntax c:
{
var p = (ArgumentSyntax)pattern;
return Match(c.Expression, p.Expression);
}
case ArgumentListSyntax c:
{
var p = (ArgumentListSyntax)pattern;
return Match(c.OpenParenToken, p.OpenParenToken)
&& Match(c.Arguments, p.Arguments)
&& Match(c.CloseParenToken, p.CloseParenToken);
}
case IdentifierNameSyntax c:
{
var p = (IdentifierNameSyntax)pattern;
return Match(c.Identifier, p.Identifier);
}
case InvocationExpressionSyntax c:
{
var p = (InvocationExpressionSyntax)pattern;
return Match(c.Expression, p.Expression)
&& Match(c.ArgumentList, p.ArgumentList);
}
case LiteralExpressionSyntax c:
{
var p = (LiteralExpressionSyntax)pattern;
return Match(c.Token, p.Token);
}
case MemberAccessExpressionSyntax c:
{
var p = (MemberAccessExpressionSyntax)pattern;
return Match(c.Expression, p.Expression)
&& Match(c.Name, p.Name);
}
case GenericNameSyntax c:
{
var p = (GenericNameSyntax)pattern;
return Match(c.Identifier, p.Identifier)
&& Match(c.TypeArgumentList, p.TypeArgumentList);
}
case TypeArgumentListSyntax c:
{
var p = (TypeArgumentListSyntax)pattern;
return Match(c.LessThanToken, p.LessThanToken)
&& Match(c.Arguments, p.Arguments)
&& Match(c.GreaterThanToken, p.GreaterThanToken);
}
default:
return false;
}
}
So it's basically a giant switch with every node type in it. By far not every type is covered here, just those that I needed to get my examples to work. I imagine to cover the most of C# syntax I'd have to tediously write a couple of thousand lines of repetitive code. I'm not going to do it all any time soon. Just the stuff I need to cover my use cases.
With a few more lines of code added this already becomes a useful tool for searching for code patterns in a codebase. Next time we see how we can implement the replace part. The goal was to refactor, not just to search, wasn't it? I have some ideas on how it could be done. See you next time.
Conclusion
Thanks to Roslyn awesome API with just 172 lines of code we have a pretty advanced code grep. Surely, it's just a toy and a proof of concept at the moment. It would take a serious effort to make it something more than that. But I'm happy with what is possible with so little effort. Amazing.
Originally published on detunized.net
Top comments (0)