DEV Community

JohnN6TSM
JohnN6TSM

Posted on

A Tale of Two Codebases (Part 4 of 4): Dependency Smell

As I discussed in Part 1 the premise of this series is a simple natural experiment: comparing 2 large codebases written by the same solo programmer before and after introduction of SOLID Design principles. PhotoDoc, the pre-intervention project, is an electronic medical record dedicated to medical forensics. Melville.PDF is a free, open-source PDF renderer for .NET.In this article I discuss dependencies.

Dependencies are a mixed bag at best. One might think that “any code I don’t have to write is good code.” On the other hand NIH syndrome came from somewhere – somebody else’s code is never going to be exactly what you hoped it would be

In reading Clean Architecture I read that adopting a framework is an “asymmetric marriage,” because the dependency might impose significant constraints on the application, but the application has no influence on the framework. (Martin R, Clean Architecture pg 293) Unfortunately, I already knew. Early on, PhotoDoc married 2 frameworks.

Lesson #1: Your domain code will last longer than any of your dependencies.

I have already mentioned that my earliest thoughts about PhotoDoc were as a WPF App. (I might go so far as to admit that some of the early features in PhotoDoc were inspired by the WPF demos that were dime a dozen in 2007.) I have already discussed, in part 2, what a mess integrating WPF controls into my domain model made for testing. The choice to marry WPF has had other consequences as well.

My life is different now than it was in early 2007. At the time I was a rural physician living my dream in rural Alaska. I ran a two-person forensic examiner program at a rural hospital that saw about 100 patients a year. I saw patients one at a time. What I really thought I needed was a simple image manipulation program. At the time my hospital used paper records, so I could write my forensic notes in Word and print them out for the patient chart.

My life is different now, I run a relatively large academic child abuse practice with 10 practitioners that sees well over a thousand patients a year. In addition to digital photographs, I get x-rays, audio, video, and documents in multiple formats. I have multiple funders and research partners, each of whom requires slightly different data in a slightly different format. I have to handle data transfers (in both directions) between my system and the legal electronic medical record at the medical center that hosts my clinics.

WPF was the hot new technology in 2007, and while it is not dead in 2022, it is no longer the darling getting all the attention. Now some of my employees prefer Macs, so they have to run windows in a VM, because I am tied to WPF and Windows. I have years of patient data stored in PhotoDoc files – but the only parser I have for those files is strongly tied to WPF and the specific windows I create in PhotoDoc, which kneecaps my ability to search through and manage the large mass of patient data I have accumulated through the years. I did not understand in 2007 that my interest in medical forensics was going to last longer than WPF’s heyday.

But today is not what I really worry about. I don’t turn 65, and nominally eligible for retirement, until 2040. At that point WPF will be 33 years old. When my turn comes to move along to something else, WPF will be as old then as MS-DOS 6.0 is right now. Microsoft has a very good record with backward compatibility, so there is a reasonable chance that I might avoid a catastrophic and costly rewrite. If I were a Mac programmer, PhotoDoc would already be obsolete. In 2007 I never dreamed I would be running a university division, let alone using PhotoDoc, in 2040. Now that future looks entirely probable.

Melville.PDF is a brand-new codebase, so I do not yet have 15 years of regrets to complain about. But I hope it will be more durable that PhotoDoc. Melville.PDF does depend on WPF, but rather than having the data model depend on WPF, a single assembly plugs into the data model and provides WPF functionality. If WPF disappeared tomorrow, I would continue using Melville.PDF with the Skia binding. Furthermore, building Melville.Pdf to support 2 different frameworks, WPF and SkiaSharp, forced me to carefully define and segregate common PDF rendering code from framework specific rendering code.

Lesson #2: Don’t Buy the Cow When You Can Get the Milk for Free

Taking a dependency on WPF is not the worst of my dependency sins in PhotoDoc. WPF shipped with, and still has, what I consider to be a significant flaw. WPF makes it trivially easy to bind to properties on POCO objects. Wpf does not have a corresponding mechanism to bind UI events to an arbitrary method on a POCO object. This deficiency results in an unending stream of MVVM frameworks for WPF.

I picked Caliburn Micro, and I have lived to regret it. At the time I took the author’s advice and copied the source code into my source, so it is not as bad as it could be. I have fixed or deleted some of the most objectionable parts, I have enhanced some of the other parts – and I still hate it. The problem is that I now have literally hundreds of view classes that don’t work without Caliburn Micro’s “magic.” Worse still, Caliburn Micro uses conventions based on the WPF Name properties assigned to controls. Even if I was willing to modify and re-test the hundreds of dependent classes there is no obvious way to search for all the locations that depend on the framework. I bought that cow and now she’s mine to keep.

Years later, I wrote my own MVVM binding for WPF. I think it’s better, of course, because I wrote it. Now I have two ways to bind to events, two ways to bind mouse moves, two ways to associate ViewModels with Views, and etc. It grates on me to see the “old” way of doing things littered throughout the codebase, but there is no way to fix it without a massive refactoring and manual testing effort.

Writing Melville.PDF, I have been very selective about the dependencies I take, especially dependencies in the core assemblies. Eventually I took 3 dependencies outside of the .NET framework, a JPEG parser, a Jpeg2000 parser, and a library that parses multiple font files. These dependencies are stable – they parse decades-old file formats. I hope I have not chosen poorly.

Should I develop regrets, however, I didn’t buy the cow this time, I just took the milk! This is as evidenced by the fact that the JPEG library is the fourth library I have used to parse Jpegs. It turns out that PDF has some rather unique requirements for JPEG parsing, and so using the WPF image parser, Six Labours’ ImageSharp, and even an educational but frustrating attempt at writing my own parser, all had unacceptable liabilities. Eventually, I was able to use the insight I gained from writing my own parser to modify an open-source parser, JpegLibrary, to meet my needs.

Unlike the dependency hell I experienced with PhotoDoc, each of these replacements was a trivial operation. Melville.PDF has only one class that knows anything about JpegLibrary, named DctDecoder. (I enforce this constraint – see the next section.) The low-level PDF parser, which is the customer in this case, declared an interface, ICodecDefinition, defining how it would like to request JPEG decompression.

Writing small adapters to make any of four JPEG libraries implement this interface has been trivial. Each time I switch the adapter class is the only thing that gets thrown out and rewritten. During the switchover from ImageSharp to JpegLibrary I had 2 adapters, and I switched back and forth several times by just commenting or uncommenting a few lines of code.

I got 2 benefits from this design. 1) My PDF parsing code, which is the code that matters, treats all stream compression formats identically, and using an interface that the PDF parser defines and makes sense for the PDF parser. 2) Implementing this interface for a variety of formats in terms of a variety of dependencies has proven to be trivial. Very little code is thrown away when the dependencies change.

Right now, I have chosen a static dependency from my parser to the Jpeg parser. Jpeg is a very stable format, and I seriously doubt it is going to change significantly, even over the next five decades I might remain on the planet. The unlikely possibility that a user would want to supply their own JPEG parser was not worth the complexity of injecting the dependency. Because I cabined this dependency behind an abstraction that I own, however, I retain the choice to inject this dependency if this library becomes a problem in the future. I will never be at the mercy of JpegLibrary in Melville.Pdf like I am to Caliburn Micro in PhotoDoc.

Lesson 3: Give Architectural Rules Teeth

The previous lesson taught us that the risk of taking a dependency is that it insidiously weaves its way into the code. The risk is that the more you use a dependency, the more its types infest the code, and when dependencies change, the removal can be painful. The reader might rightly argue that JpegLibrary was a very simple interface – it takes a stream and returns a stream – so it may not be the best illustration of the ability to contain dependencies within a codebase.

For the next demonstration I would ask you to look at the SharpFont Dependency. SharpFont provides core services very near to the heart of PDF rendering. SharpFont implements an abstraction over 5 or 6 different font files formats that PDF supports. The library is intimately involved in every character that is written. Furthermore, PDF defines character mappings as a complicated mix of tables from the font file and tables from the PDF file that are combined using a complicated mess of overlapping rules. SharpFont is an ideal candidate to embed itself in the codebase never to be removed.

I would love to ditch SharpFont someday. It has native dependencies I would prefer to avoid. It also has a very C-centric API that does not play nicely with the C# garbage collector. Its glyph mapping scheme is not thread safe, so I must serialize all the font operations with a semaphore. As much as I hate this library, it parses several notoriously tricky font file formats quickly and correctly, and nothing else I found does. My relationship with SharpFont is not that of a cherished wife, but a necessary mother-in-law.

The clean solution, as I already discussed, is to wrap up all the ugliness I don’t like in a thin wrapper class that implements the interface I wish SharpFont presented me. That class is FreeTypeFont, which implements the IRealizedFont interface. Unlike the wrapper class in the past section, FreeTypeFont is not a trivial class. It has numerous helper classes, some static data, and implements a significant portion of Melville.Pdf’s useful features.

One risk is that SharpFont defines a bunch of accessory types on its own. SharpFont defines enums for various character styles, classes to represent font families, fonts, characters, and various mapping tables. If my wrapper class takes these types as arguments or returns them from public operations, then the wrapper class will fail to insulate the rest of the code from this least favored of my dependencies. Even putting FreeTypeFont in its own assembly would be insufficient because in C# assembly dependencies are transitive.

The Roslyn C# compiler allows custom analyzers that run during the compilation and can emit warnings, or even errors, that effectively add additional constraints to C# that are specific to a project. I implement such an analyzer that contains dependencies. In an architecture definition file I have restricted any references to SharpFont exclusively to the namespace of Melville.Pdf.Model.Renders.FontRendering.FreeType namespace and its descendants. The code I could possibly be required to rewrite if I eventually switch dependencies is carefully penned in one namespace because the compiler will not let me say the name of any of SharpFont’s types outside that namespace. (And yes, the analyzer uses the Roslyn semantic model, so it is smart enough to detect forbidden type usages that do not explicitly say the type name.)

The FreeTypeFont wrapper is a significant piece of code. If I ever find a replacement for SharpFont, rewriting FreeTypeFont will be expensive and difficult – that is the cost of rewriting a significant part of the library. Because I have the architecture analyzer, I have a solid upper bound on how expensive it will be: I might have to replace all nine classes that can see the library, but it won’t be worse than that.

Incidentally, as I went through the code writing this article, I noticed the opposite architectural problem. Code that parses PDF font structures had migrated into the “danger zone” where accessing SharpFont was allowed. It took me about 15 minutes to move these classes to more appropriate namespaces, and no unit or integration tests broke in the process. It I ever actually get to ditch SharpFonts, there is more code I could move out of the danger zone and reuse. It was not worth creating those abstractions right now because I suspect I am stuck with SharpFont for the foreseeable future.

I use the architecture analyzer throughout Melville.PDF to keep the high-level dependency graph acyclic and to contain my dependencies. As I have been living with the architecture analyzer for the last year, I am surprised by the number of times I inadvertently violate architectural rules even when trying to create a carefully layered design. I am convinced that architectural rules need teeth to be observed.

Conclusion

Dependencies are inevitable in any software project because no one writes on the bare metal anymore. Dependencies cause problems when your project ages more slowly than the code you depend upon. Since software has an insidious habit of lasting longer than anyone anticipated, Clean Architecture dictates that one contain dependencies specifically because they are likely to change.

This article also ends the four-part series reporting the results of a natural experiment comparing two codebases from before and after I adopted clean coding practices as promoted by Robert Martin. As discussed above, I have reaped significant benefits in terms of testability, code reuse, and flexibility to switch dependencies. I believe this will make it easier to adapt Melville.PDF to future platforms than has been possible with PhotoDoc, but the future remains to be seen. Most interesting to me, though is that cyclomatic complexity, class cohesion, and Microsoft’s maintainability index did not differ appreciably between the projects. This suggests to me that SOLID design provides additional maintainability benefits beyond those considered by earlier co

Top comments (0)