edit: Unlike the conclusion of the post below, I based the analyzer on C# CaaS and not .NET IL. Its now available to use https://devsnicket.com/eunice/#csharp. The latter doesn't preserve the order of members of a class either in the dll or pdb. Although, after looking at some open source C# projects, this capability wasn't used by default.
The software development tool I've been working on called Eunice started out with JavaScript analysis. Its designed to support multiple interchangeable analyzers, reusing the tool's measurement, graphical and interactive components. These other components were also written in JavaScript so they are cross-platform. In the earlier phases of development I thought it would be beneficial to follow the practice of dogfooding. To see how this looks you can view and interact with Eunice's analysis of itself here.
Eunice is now at a level of maturity where I've decided to write another analyzer. With personal experience in C# and .NET, I thought creating an analyzer for them next would be productive for me and useful to others.
C# or .NET?
The vast majority of .NET is written using C# which includes a CaaS (Compiler as a Service). Using the syntactic/semantic models provided for source files would allow an analyzer to represent the software how it appears to the developer while they are working.
Software written with C# is also available in the form of compiled assembly files (.dll) containing CIL (Common Intermediate Language). This has a more restricted set of instructions, but as a result is more verbose. If this verbosity isn't going to add complexity to an analyser the restricted set of instructions might simplify it. Some information from the source code, not necessary to run it, isn't included when compiling; however, this information can be preserved (e.g. for debugging) in a accompanying file (.pdb).
I believe that using the C# CaaS would be more productive than working with the debug files that accompany the compiled CIL. If the information required to implement an analyzer is readily available from CIL alone (i.e. without the debug files) then that would be even more productive.
source code directories and namespaces
Structure in the form of nested groups, can be specified both in the source code directories and in namespaces. Both of these parallel ways of structuring software are normally used, but don't have to match.
Its possible to use both, have a structure that matches everywhere else, but then use one to insert additional groups not present in the other. Even if discrepancies are constrained this way, the variation still adds potential for confusion when navigating a code base.
If the structures are different, representing both simultaneously would require 4 dimensions (2 x 2D) and would be confusing. To avoid this in the analyzer, one will be chosen over the other. The analyzer will use namespaces, as namespaces are used in C# to reference dependencies not file paths. Namespaces are included in CIL so only .NET analysis is required and not C#. For reference, source code paths aren't included in CIL, but are available from the accompanying debug files.
language features
Some C# language features are represented in CIL with additional structure not found in the original code. Some of this additional structure will contain only generated instructions; however, in other cases compiler output for pieces of the original C# will be placed within these generated structures. CIL has meta-data to mark structure as compiler generated with an attribute. Regardless of the attributes presence or methods being entirely generated, there may still be dependencies that need including in the analysis.
I've created the tables and lists below of C# features and how they are represented in CIL:
C# 1
feature | named | implementation | generated | marked |
---|---|---|---|---|
delegate | class | n/a | extern | X |
enum | class | n/a | field | X |
event | event | add/remove | ||
event (automatic) | event | add/remove | field get/set | ✓ |
indexer | property | get/set | ||
operator overload | method* | |||
property | property | get/set |
Not represented in CIL:
- built-in type keywords
- line directive
- region directive
- using directive
C# 2
feature | named | implementation | generated | marked |
---|---|---|---|---|
iterator/yield | n/a | method of | class (nested) | ✓ |
methods (anonymous) | n/a | method of | class (nested) | ✓ |
C# 3
feature | named | implementation | generated | marked |
---|---|---|---|---|
lambda | n/a | method of | class (nested) | ✓ |
property (automatic) | property | field get/set | ✓ | |
types (anonymous) | n/a | class (namespace) | ✓ |
C# 5
feature | named | implementation | generated | marked |
---|---|---|---|---|
async/await | method | method of | class (nested) | ✓ |
C# 6
Not represented in CIL: using directive (static)
C# 7
feature | named | generated | marked |
---|---|---|---|
local function | method* | class | ✓ |
C# 8
feature | named | implementation | generated | marked |
---|---|---|---|---|
iterator/yield (async/await) | method | method of | class (nested) | ✓ |
* name would need inferring / reformatting
conclusion
All the features above where a name from the C# is required, have an item in CIL where it or derivative of it is available. Anonymous methods and lambdas might have dependencies, but those would be added to the item representing their parent class.
Delegates and enums aren't marked as compiler generated and don't have any implementation that would need representing (e.g. sub-items or dependencies).
Several features move and mix the CIL representing the original C# inside generated methods of generated classes. However, these follow patterns where dependencies of whats generated could be ignored.
Region and using directives are two features listed above as not represented in CIL that would be very apparent in the C#. Regions could be represented in analysis as an extra level of item grouping and doing so would match how they appear in C# files. Although the use of regions is debatable, their inclusion in Eunice would be a useful demonstration of their characteristics. Using directives and their ability to specify aliases has a significant effect on the potential dependency scope and verbosity of c#.
Based on the findings above, speed of the analyzers development will be prioritized over inclusion of region and using directive analysis. I think there is value in those two features, but it'll be more productive for them to be included in a C# CaaS based analyzer, if one were to be developed in the future.
Graham Dyson - creator of Eunice
Top comments (2)
Would using .NET make it easier to analyse F# code in the future? Although F# has FCS where you can analyse the source code.
There's potential to reuse non-C# specific parts of .NET analysis, share and build on them to create a F# analyzer. For the analysis to look like the source F# there would need to be specific behaviour for things like Records and the nested types generated for lambda expressions.