Hell is other people's DLLs

#csharp #dotnet #computerscience

Heads up! This blog post is actually a reading log. I'm spending 30 minutes every day (more or less) reading C# via CLR by Jeffrey Richter and taking notes as I read it. This is tracked as a series on DevTo so you can read all parts of the series in order.

C# via CLR Reading Log 4: Chapter 2

We've finally arrived at Chapter 2 of the C# via CLR book. The last chapter made quite a few references to material that would be covered in Chapter 2, so I'm excited to see if it will live up to the hype.

The first few paragraphs of the chapter mentioned that it would be covering how to build and deploy assemblies and I was concerned because, frankly, I don't really care about that stuff (but maybe I should). My interest recovered a little bit in the following paragraphs where the author covered some of the pitfalls of the previous library and app models in Windows and highlighted the chaos that ensued as a result ("DLL hell", we've all been there).

The discussion about the difficulty of dependency management and ensuring backward-compatibility across various components of a system really resonated with me. I'm a contributor on nteract, a project that ships a core SDK with various components and interdependencies and I've had to deal with my own sort of DLL hell. I guess some problems are just universal.

Anyways, back to the book, it summarizes a list of annoying things about installing software on Windows:

Too many gosh darn DLLs and too many gosh darn incompatibilities between them
Every installed application must touch every part of your OS from file system to registries to options to ensure that it is properly loved
Securit does not exist! You wanna install software? Deal with the consequences, it's a scary world out there.

Hahaha! .NET Framework is presented as the solution to the problems listed above. The book provides a brief summary of some of the ways it addresses thee problems (code access security for resolving security issues) but doesn't dive deeply into them just yet. So I guess I'll circle back to this topic when the book covers it more fully.

Forthcoming chapters of the book dived back into territory I do not enjoy, commands you type in a terminal to get a computer to do things. groan Gimme some nice juicy concepts to learn!

After I read (more like skimmed) this portion of the book, I got a section that I had been looking for: a deeper dive into the metadata associated with a managed module. This content was referenced in the previous chapter so I'm glad I got what I was promised.

Metadata is stored in a binary blob that represents three different tables: definition tables, reference tables, and manifest tables.

Definition tables, as the name might suggest, keep track of everything that is defined in the source code. This includes things like properties on a class, methods, and types.
Reference tables maintain a record of everything your code references like other assemblies, modules, and external types.

The book didn't explain what manifest tables were but some quick Googling reveals that manifest tables are used to store metadata information like the version of the module or a special security signature.

The book also showcased how to use a command-line tool (ILDasm.exe) to view the contents of the metadata of a module. This is one thing that I've appreciated about the references to tooling in this book. I appreciated the fact that formats (with the right tool) were relatively easy to disassemble or inspect. Obviously not as easy to explore as artifacts from other programming languages, but still easier than I expected.

In the exploration of metadata tables, the book highlighted an important takeaway. For smaller files, the metadata tables might compromise more space than source code in the resulting managed module. You need more space than the actual code you wrote to define the key data types within the file. Metadata tables really pay off for larger programs where types are re-used frequently. In this scenario, the source code would be larger than the metadata tables.

The next section of the book is titled "Combining Modules Into Assemblies". I'll admit I had almost forgotten that individual modules can be combined into a single dependency. There's a decent amount of nesting in these structures but I'm starting to get a mental model of how each module is structured and how that structure falls under an assembly. Perhaps I'll draw up a picture and include it in my next blog post...

The first sentence of this section reminded me why I was so confused. An assembly is a collection of one or more managed modules. So a single managed module can also be an assembly. Multiple managed modules can also be an assembly. I've also heard that a grouping of school children is an assembly.

The importance of assembles is clarified in a forthcoming discussion on how the CLR processes assemblies. It always starts off with a manifest table, which includes references to the files within an assembly. These files can be within the current assembly or in a different assembly altogether.

And as it turns out, not just any managed module can be an assembly. To be secured and versions (and deployed) a managed module must be encapsulated into an assembly that contains the necessary metadata.

I'm starting to get a handle on how all these components connect, but unfortunately, it's the end of today's 30 minutes of reading -- so I'll continue this exploration in the next reading log!

Top comments (2)

Patrick Charles-Lundaahl • Feb 16 '20

Thanks for sharing! I love this format. Reading your actual experience of the book, as opposed to just the review or notes at the end, is really nice.

Safia Abdalla • Feb 16 '20

Thanks! I'm glad you like it. These are actually pretty fun to write too!