DEV Community

In Defense of Defensive Programming

Adam Nathaniel Davis on October 02, 2020

[NOTE: In this article I reference a validation library that I wrote called allow. It's now in an NPM package that can be found here: https://www....

Read full post

Valts Liepiņš • Oct 2 '20 • Edited

As for the safety of external input in TypeScript case, I want to suggest a third option - parsing the data.

There are some great articles on validating vs parsing. Validation is like poking the data at certain points of application to see if it seems alright, but that comes with a drawback of possibly not checking enough or checking already validated data over again.

Where parsing is like taking the data and transforming it to a concrete form, that other functions can work with. If the data can not conform to this form, that's erroneous data. Once the data is transformed to this specific datatype, it's safe to assume that it will be correct since you have ended up describing it with concrete type and type checker can ensure that it will be properly treated by rest of the code.

Specifically in TypeScript case, one would define the acceptable format of data using interface. So a parser would be any function, that can take raw data (string, other possibly invalid structure) and returns object that fulfills an interface.

Sources of inspiration:

Adam Nathaniel Davis • Oct 2 '20

This is quite valid. And I've already noticed a few other comments referring to parsing. I myself have actually gone quite a ways down this road in some of my previous projects, and I think there's a definite time-and-place for this. But I don't think it's a use-everywhere kind of solution.

In TS, you can do this by creating classes that will, essentially, validate the data. And if the data doesn't conform, it throws an error or populates some kind of error state. Then you force the function to require an instance of that class. This is useful - but it still runs into a few problems:

Like everything with TS, it's useless (nonexistent) at runtime.
Dealing with the object that holds our value can sometimes be more annoying than simply having direct access to that value.

Granted, these aren't "deal breakers". But they're something to think about when considering the "parse in TS" approach.

Valts Liepiņš • Oct 2 '20

I'm coming with this idea from Haskell, so I wanted to try expressing it in TypeScript.
This is what I ended up with:

As for runtime safety, it's really up to how well one uses type system to enforce valid system state. The main constraint here is how expressive the type system is. I can't personally comment on limits of TypeScript, but from my little codepen expriment, it seems to be pretty capable.

Kimmo Sääskilahti • Oct 2 '20 • Edited

Thanks for the great article! One thing that came to mind is that in the great Pragmatic Programmer book the authors say your code should always fail early. So if something unexpected happens, one shouldn't just accept it but raise an exception. So could your example of using allow be interpreted as following their "use asserts in production" advice, what do you think?

For run-time type checking in TS, have you taken a look at the io-ts library? It's nice when the user or server inputs are complex.

Adam Nathaniel Davis • Oct 2 '20

First - yes! "Using asserts in production" is absolutely another way of describing what I'm trying to get across!

And second, no - I haven't checked out io-ts. But I appreciate the heads-up, and I'll definitely give it a look!

Kimmo Sääskilahti • Oct 2 '20

Cool! I also can't resist linking the great Parse, don't validate blog post by Alexis King, though it's most directly concerned with strongly typed languages like Haskell.

Valts Liepiņš • Oct 2 '20

I referenced the exact same blog post in my response!

You might be interested in seeing my attempt at applying those ideas to TypeScript:
dev.to/cipharius/comment/15im8

Mike Bybee • Oct 2 '20 • Edited

Upon reading the title, I had a feeling TypeScript would come into this, since most of the derogatory mentions of "defensive programming" I've heard (and most utterances of "comments are a code smell") have come from those defending TS against my critiques of it.

Adam Nathaniel Davis • Oct 2 '20

I actually believe that "comments are a code smell". But that's a topic for another article...

Mike Bybee • Oct 2 '20 • Edited

I think comments can get out of hand, but it's good to comment functions, their params/types/etc., and their returns (and such use of JSDoc with linting - and ironically even ts-lint in VS Code - makes the most common argument for TS a moot point), and JSDoc can be just as handy for generating code docs as javadoc and other similar inspirations for it; it is especially rich, however, to hear "code smell" uttered by those who tack on mountains of extra nonstandard syntax just to hack JS into behaving like another language (and do so using a language with a typeof keyword that switches from its own context to JS context depending on where it's written).

Adam Nathaniel Davis • Oct 2 '20

I think we're basically in agreement. I don't know if you saw it, but I basically wrote a whole article talking about how JSDoc is basically... TS. (dev.to/bytebodger/a-jsdoc-in-types...)

So, if your comments are essentially a way of providing type hinting for your IDE (like JSDoc), then yeah, I get that. But most "traditional" comments - the ones where you think you're telling me what your code does - are a "code smell". If I have to read the comments to understand your code, it's crappy code.

Mike Bybee • Oct 3 '20 • Edited

Mostly. I think a landmark here and there can be helpful, especially for future refactors. Sure, one could always ask, "Why aren't you just abstracting it now?" but I think we've both seen enough premature abstractions and lost edge cases to know better.

Daniel Ziltener • Oct 3 '20

fyi, a function checking its inputs before processing (and potentially its outputs before returning them), those checks are called contracts.

And user inputs... yea, that's not "defensive programming", that's just common sense to check them

Adam Nathaniel Davis • Oct 4 '20

Totally agree!

huncyrus • Oct 2 '20

Two story for defensive programming.

1.) Personal story
12 years ago, I wrote a check and validation heavy small framework w/ PHP 5, what never trusted anything, paired and hashed everything and checked everything always. The service still running with it, and I did not touched the code in the previous 5 years at all. So there are a bunch of things what is deprecated because of PHPv7. The service got approx 8-10 million of bot inquiry (ddos, checks, typical injections and generic attacks) per year, and because of the heavy security solutions, the service still up and running, had zero downtime nor breach (what I am proud of).

company story Many years ago, when I started to work at the company where I still workin', I was the only one who designed, planned, built everything as defensive, as possible under c++/js/php. My colleagues does not like it, so they started to remove most of my modules and applications/services. After a few years, GDPR and security breach hitted the company, because of the weakened security. A few quote:
- "You have to trust the input what other internal service give you, because we did it!"
- "You only need SSL/HTTPS for web security!"
- "You don't need ACL in the portal!"
- "You can not breach a HTTPS connection"
- "The perfect load for the server is around 90% always!"
- "We do not need scalability, we just click on the cloud dashboard for more cpu and memory!"
- "You do not have to double sanitize logined user input, we trust our partners after they login!"

So now everything on fire, and the company will hire a devop/sysop and security expert company for huge amount of money to review everything and point out how to fix the infrastructure and codes.
Except my parts. What they did not touched, they are still safe, not compromised and has been updated by going through OWASP and other security lists (since there are a few other also).

Adam Nathaniel Davis • Oct 2 '20

Awesome stories! And this hits upon a point that I didn't really address in my article. Mainly, that when you do those "acid tests" on inputs, it doesn't just make your code sturdier in the short term. That bulletproof code tends to stay bulletproof for a very long time.

I've seen some very old, very stable mainframe code - the kinda code that was written decades ago and is still running. And that code tends to use a lot of these voracious ("defensive") techniques.

Basti Ortiz • Oct 2 '20 • Edited

Glad to see this discussion again. I agree with your overall thesis, but I propose a slight adjustment to the assertion that all functions must be treated as their own separate programs.

When every single function is treated as a "hostile" entity, I'd be remiss if I wouldn't mention the performance implications of multiple redundant validations throughout the app.

Personally, this is the main reason why I've been reluctant to validate every single input to every single function. It's like an "itch" of sorts. You can argue that the costs are negligible, but they indeed exist.

Hence my slight adjustment to your assertion. I propose that all validations must be done within a central area of the app, preferably the user-facing side. That way, all defensive tactics may be employed at the site of external input. After this point, one can use TypeScript (for example) to uphold the "contracts" between the central input validator and the application logic.

Moreover, this allows us to focus our validation-based integration tests on that specific area of code. The rest of the codebase can sleep well on the assumption that the central input validator has done its job well. If not, then we only need to worry about changing the code for the central validator. Rinse and repeat until the application is "robust".

To cite an analogy, one can imagine a bouncer at a night club. The bouncer is responsible for "validating" all guests from outside. Once they've been "validated", the internal structure (i.e. the night club) can service the guest on the assumption that the bouncer has done their job well. No validation redundancy required.

In your example in the article, we can apply this technique by creating a class for player statistics. All input would be validated in the constructor. Once all of the assumptions have been validated, then the methods could sleep well on the assumption that nothing can go wrong with the initialized class properties.

Basically, what I'm trying to say is that I wouldn't advocate for an extremely defensive paradigm. We can have the best of both worlds simply by delegating and centralizing all validation logic in the application so that the validation overhead cost is only paid once, preferably at the exact site/s of external input.

Adam Nathaniel Davis • Oct 2 '20

Thanks for the thoughtful reply. And I'm about 99% in agreement with you. I do concur that, if external inputs are properly vetted (by the "bouncer"), then TS multiplies in usefulness.

The only part of your response I'd quibble over is the concern over performance.

Obviously, every single LoC is, on some level, a "hit" to performance. So I'm not going to tell you validating, in the way I've shown above, has absolutely no impact. But such concerns almost always fall under the category of micro-optimizations.

I see these same kind of discussions from those who want to bicker over how much faster a for () loop is compared to an Array.prototype function. (Not saying that you're that person - just saying that these performance "concerns" can lead down the same path.) The simple fact is that the vast majority of all applications - even large, heavily-used applications - will never have the slightest need to worry about for-vs-Array.prototype, or defensive-vs-bouncer validation. And if the app does need to focus on such minute optimizations, the programmers could probably achieve much greater gains by focusing on much larger pieces of the application's general flow.

Nevertheless, none of that is meant as any kind of "rebuttal" against what you've written. You make excellent points.

Basti Ortiz • Oct 2 '20

That is definitely true. I must concede that it does sound a bit like premature optimization on my part. It's really just that "itch" to squeeze out every single CPU cycle, you know? 😅

Adam Nathaniel Davis • Oct 2 '20

Oh, yeah. I'm with you. And I certainly don't speak about "micro-optimizations" in an accusatory manner. I've gone wayyyyy down that road - too many times. We've all been there. Once you start trying to count those CPU cycles, it quickly becomes a bit of an obsession!

theScottyJam • Jun 20 '21

It sort of sounds like if typescript automatically provided runtime assertions with each function, then half of your concerns would be gone (though you still have to do manual checks for things such as distinguishing positive from negative numbers).

All of your arguments in this post seem to hang on the idea that "Every single function is a program", and I'm going to disagree with that root argument.

When you mean every, do you really mean every?

What about functions inside functions?

export function doStuff() {
...
// I assume this function doesn't need to be validated
const result = myArray.map(user => user.name)
...
// nor does this
const extractName = user => user.name
const result2 = myArray.map(extractName)
}

ok, that was a bit silly. But what if we moved them outside? Is our program that much more fragile now?

const extractName = user => user.name

export function doStuff() {
...
const result2 = myArray.map(extractName)
}

All we've done is moved the function to a place where anything in this module can call it, instead of whatever is just inside the doStuff() function. The amount of danger this poses depends on how big the module is - if this module only contained the doStuff() function and the helpers that we pulled out from it, then there's no more danger having them outside the function than inside. It's unclear to me whether or not you find non-param-validating private functions like this bad or not, so let's take it a step further.

Let's say our module name was _shares.js, and the developers on the team understood this naming convention to mean "don't touch this unless you're inside this package" (or maybe we're working in a language like Java which actually has package-private access-modifiers). And now we start exporting our extractName function for the package to use. How bad this is depends on the size of the package. Having this exported utility function in a really small package is less risky than keeping it private within a ginormous module, so a rule like "all exported functions should validate their params" is a bit of an arbitrary boundary.

We can take it to the next step and make it a public export of an internal company library, or another step to make it a public export for the general public.

In all of these steps, the only thing we're doing is changing how many places have access to this function - the more places, the riskier it is to not validate its inputs. So claiming that "all functions should be stand-alone programs" sounds nice in theory, but in practice, no one's going to add parameter validation to every single function (like the anonymous functions used in .map()), and there's no clear cut way to define the line of when it should and shouldn't happen.

And what's the disadvantage to not validating parameters? Bugs in your program are harder to track down.

I guess what I'm getting at is that there's a balance. if few things call your function (which is the story of most functions in a project), then it's better to keep it lean and focus on readability over helpful errors. As its usage grows, people can always retroactively make improvements to it. If your function gets used all over the place, then put some more helpful errors. In some cases, you might have to add a fair amount of code to really generate good error message for the number of ways people may mess up (even using a library like yours), and you might need to write longer, more explanatory error messages too - that kind of verbosity just isn't appropriate in every single function ever.

Other places that benefit from parameter validation include:

Some forms of string interpolation (e.g. generating HTML, or an SQL query - these are attack surfaces, and should be heavily validated).
Code that's doing persistent changes (e.g. database writes), where it's preferable to have errors happen in advance so that it doesn't leave things in a permanent bad state. In other words, a chunk of code that's needs to perform a single, reliable atomic operation using multiple steps.
I'm sure other specific scenarios exist too.

Blaine Osepchuk • Oct 3 '20

The amount of defensiveness I employ depends on the context.

For example, little utility scripts I write for my own one-time use see very little defensive programming. It's usually unnecessary. If I encounter an error when I run the script, I'll fix it immediately and continue until it runs successfully and then delete the script.

Whereas extremely complicated, large, long running, multi-programmer, multi-million dollar efforts see extensive defensive programming (especially in the critically important code paths). My experience in these kinds of projects is that the cost of adding defensive programming to the code is easily recovered by the time and frustration I save in testing, responding to bug reports, debugging, patching, and so forth.

So the key, in my opinion, is to know how to program defensively and then constantly evaluate the context of the code I'm writing and apply the right amount of defensiveness for that context. This last point where some of the disagreement may be coming from in the discussions of this issue. People are working on many different kinds of projects and what's appropriate for one project isn't necessarily appropriate for another.

Adam Nathaniel Davis • Oct 4 '20

Couldn't agree more.

One of the common threads in my articles is my hatred for universal (mindless) rules - for the sake of rules. So even though I generally advocate for defensive programming - as a default - I would HATE for anyone to adopt it as a mindless dictate. There are absolutely times when defensive programming is simply a waste of effort.

Even the most basic, common, universally-accepted rules of programming should still be secondary to common sense. And defensive programming is no exception.

I just get annoyed when people choose to paint "defensive programming" as a known bad, merely because they can't be bothered to do the work that's necessary to properly validate their programs' functions.

Julien Bouvet • Oct 2 '20

Very very nice article.

I think it's a matter of being pragmatic. Defensive programming save some times on debugging, on security fixing, etc...

It's only a matter of implementation details to provide simple and clear way to do so.
Your allow is a very good example, and a lot of ways exist to do the same.
I used to implement validator classes in C#, and it was easy to use, crystal clear, and in the end, with a very limited impact on performances (especially in web tech where those checks, if properly implemented represents a marginal cost compared to page generation, images loading etc...)

Thanks a lot for sharing this with us all :) I will look in your other articles, you just got a new follower :p

Adam Nathaniel Davis • Oct 2 '20

Thank you for the feedback! And, indeed, I agree with pragmatism. I'm not going to "code shame" anyone because they haven't validated every single input on every single function/method/etc. But I do look askance at anyone who presumes that any such validation represents the "dreaded" defensive programming.

Adam Nathaniel Davis • Oct 2 '20

Thank you!