Software Quality Defense In Depth (5 Part Series)
One of the hallmarks that distinguishes a senior developer from a junior developer is the art of planning. I’m not talking resource allocation or scheduling (though those are important). I’m talking about getting a task, sitting down at the keyboard, and explicitly not coding.
It’s not that coding is bad. I run a team of software engineers and I can tell you from experience that it’s a very bad thing if none of them are coding. But what’s worse is solving problems without planning out an approach.
Let’s take a look at why that is and what this means for us as software engineers.
Note: This post is part of a series called Software Quality Defense in Depth.
Let’s say you’re sitting down to make a simple fix to address a known problem. Code is already in place, but for whatever reason it’s not working properly in all cases.
If I’m peer reviewing a fix, one of the key things I’m looking for is to find out if this fix definitively addresses the problem without causing additional problems.
Test Driven Development (TDD) is a key way of proving that you are fixing an instance of a defect since you start by writing a test that reproduces the bug. However, this doesn’t go all the way to ensuring a quality outcome.
When you’re making any change, I recommend stopping and thinking about the top five worst-case scenario things that can happen because of your change. You won’t usually find five, but the exercise forces you to think creatively and identify areas of risk.
Once you’ve identified your key areas of risk, you need to think about your strategy for catching defects you might introduce.
For example, if I’m fixing an error in validation code and my fix is going to make validation less strict, I’m going to think about the risks I’m introducing for unintended invalid input to get through. Because of this, I’m going to think about how to catch those types of issues, and hopefully write some tests around my attempts.
I’ve advocated elsewhere thatconfirmation bias is a force to be reckoned with, and I maintain one of my key takeaways from that article:
If you force yourself to spend a fixed amount of time trying to break your code, you’re going to be motivated to use that time to find defects so that the time isn’t “wasted”.
I’m just going tho throw this out there: for me and many others I talk to, adding features is a whole lot more fun than fixing defects.
Because writing new code is fun and exciting, the temptation to jump right in and figure it out, then clean it up as you go is real. I’ve felt it, I’ve enjoyed it, and I’ve succeeded with it.
So what’s the problem? After all, if you’ve written something that meets requirements from business and is code you’re happy enough to commit, that’s a win for everyone, right?
Here’s my problem with that: If you went in without a plan, how much time did you spend thinking about how this feature can break now or in the future?
The choices we make when designing systems matter.
If you’ve done any sort of work on software performance, you know that the number one cause of slow performance is software design. However, design impacts so much more than performance.
Architectural decisions directly influences the types of obstacles, issues, development experience, and overall quality of a system over time.
You may not think of feature development as software architecture, but it is at least at a micro-scale. You’re either creating new patterns for code to follow or reinforcing existing patterns present in the code. Patterns we start or continue will naturally propagate to larger areas over time.
These patterns have strengths and weaknesses in their maintainability, extensibility, reliability, security, and many other important characteristics.
This is why whenever you are starting a new feature, it is important to think through how you’re going to implement it, the weaknesses of that approach, and the risks you are taking in adopting your approach.
Here’s a small example to consider. If you were developing an application that talked to a web service to perform an operation that doesn’t complete instantaneously, you have a few different options.
Specifically, you could design something that waits for the operation to complete on the server-side and displays the result to the user. This is known as a request / response model. Contrast this to the request / acknowledge model where the client application makes a request which then gets added to a queue and performed at some point in the near future.
Both models are viable, but each has very different characteristics:
If you go the request / response route, your overall call time is going to be slower but you’ll know more at the end of it. On the other hand, applications relying on services like this typically show a lot of busy indicators on the user interface. These systems also have a habit of passing error codes down to the caller, which means the client needs to know a lot more about error handling than it otherwise would.
Contrast this with a request / acknowledge communication pattern. In this model, a request is received and validated for overall correctness in format / authorization. If it appears to be valid, it is queued up to be processed later by some other system. The method is quick and will return a status code indicating if the item could be added to the queue or if some error occurred. Optionally, it may return some identifier or URL for checking on the status of the request later.
Both approaches are equally viable, but each leads to different strengths and weaknesses that you need to prioritize based on your needs. If you need to have the results available to the user immediately after the operation succeeds, you might go with the request / response model, for example, but you’ll need to think through how to handle error responses from the server and shore up the reliability of systems since server-side faults are far more visible in this model.
Hopefully I’ve sold you by now on the fact that our plans (or lack thereof) matter and impact the software we create. So what does it look like to plan out feature or fix work before development?
There are a few different starting points for me in this exercise, and a lot of it depends on my degree of knowledge and comfort with the problem I’m trying to solve.
If I know nothing about the code, I’ll take a look around the methods and paths related to the code that will need to be modified or extended. I’ll likely do some whiteboarding or use a tool to diagram out the as-is system’s major characteristics and flow related to the area I’m modifying.
Once I’m to the point where I understand enough of the territory I’m working with, I’ll take a step back and identify my strategy for verifying that I succeeded in what I’m about to do and in verifying that I caused no new defects.
Typically I like to identify at least two safety nets for catching issues prior to things getting to QA. This could be anything from relying on the compiler when making major structural changes to snapshot-based pinning tests like Jest or Snapper.
Once I have that high-level quality plan in mind, I’ll start development by writing a number of empty or failing tests to represent the new test cases I’ve identified. It’s at this point I’ll try to replicate the defect in a unit test when doing defect resolution.
Once my safety net is in place, I’ll start in on development.
So, to summarize, planning is the most important thing you can do to reduce defects when making a change.
Yes, unit tests, testing, code review, and many other things are important parts of building a strategy to prevent defects. However, I’d far rather take an intelligent path around a minefield than meticulously work through how to avoid every single mine while walking straight through it.
Yes, this takes time. Yes it often takes more time than modifying the code. Yes, this exercise speeds up the activity of writing code later.
Ultimately what we’re trying to do isn’t write as much code as quickly as possible, however. We’re trying to write code that meets our business needs without breaking anything else. We also want to write code that works today and resists breaking in the future.
And for that, we need to plan.