Rémy 🤖

Posted on Jul 20, 2019

10 rules to code like NASA (applied to interpreted languages)

#programming #beginners #codequality #productivity

Foreword — Dear beginner, dear not-so-beginner, dear reader. This article is a lot to take in. You'll need perspective for it to make sense. Once in a while, take a step back and re-think about all the concepts explained here. They helped me a lot over the years, and I hope that they will help you too. This article is my interpretation of them for the work I do, which is mostly web-related development.

NASA's JPL, which is responsible for some of the most awesomest science out there, is quite famous for its Power of 10 rules (see original paper). Indeed, if you are going to send a robot on Mars with a 40 minutes ping and no physical access to it then you pretty damn well should make sure that your code doesn't have bugs.

These rules were made with embedded software in mind but why wouldn't everybody be able to benefit from this? Could we apply them to other languages like JavaScript and Python — and thus make web applications more stable?

That's a question I have been considering for years and here is my interpretation of the 10 rules applied to interpreted languages and web development.

1. Avoid complex flow constructs, such as goto and recursion.

Original rule — Restrict all code to very simple control flow constructs – do not use goto statements, setjmp or longjmp constructs, and direct or indirect recursion.

When you use weird constructs then your code becomes difficult to analyze and to predict. The generations that came out after goto was considered harmful did indeed avoid using it. We're at the stage where we're debating if continue is goto and thus should be banned.

My take on this is that continue in a loop is exactly the same as return in a forEach() (especially now that JS has block scoping) so if you're saying that continue is goto then you're basically closing your eyes on the issue. But that's a JS-specific implementation detail.

As a general rule you should avoid everything that is mind-bending or hard to spot because if your brain power is spent understanding the quirks of jumping around then you're not spending it on the actual logic and then you might be hiding some bugs without your knowledge.

I'll let you be the judge of what you put in that category but I would definitely put:

goto itself of course
PHP's continue and break used in conjunction with numbers, which is just pure insanity
switch constructs, because they usually require a break to close the block and I guarantee you that there will be bugs. A series of if/else if will do the same job in a non-confusing manner

Besides this, avoid of course recursions, for several reasons:

As they build on the call stack, whose size is very limited, you can't really control how deep your recursion can go. Even if your code is legit, it might fail because it recurses too much.
Do you get this feeling when doing recursions where you don't really know if your code is ever going to stop? It's very hard to imagine a recursion and to prove that it will stop correctly at the end.
It's also more compatible with the following rules to use an iterative algorithm instead of a recursive one, because you have more control (again) on the size of the problem you're dealing with

As a bonus, recursions can often come as an intuitive implementation of an algorithm but is usually also far from optimal. By example we often ask in job interviews to implement the factorial function using a recursive function but that's far less efficient than an iterative implementation. Regular expressions too can be disastrous.

2. All loops must have fixed bounds. This prevents runaway code.

Original rule — All loops must have a fixed upper-bound. It must be trivially possible for a checking tool to prove statically that a preset upper-bound on the number of iterations of a loop cannot be exceeded. If the loop-bound cannot be proven statically, the rule is considered violated.

The idea with this rule is the same as with the interdiction of recursions: you want to prevent runaway code. The way you implement this is by making sure it's trivial to prove statically that the loop won't exceed a given number of iterations.

Let's give an example in Python. You could do this:

def iter_max(it, max_iter):
    cnt = 0

    for x in it:
        assert cnt < max_iter
        yield x
        cnt += 1


def main():
    for i in iter_max(range(0, 100), 10):
        print(i)

A language like Python will however limit the number of iterations by itself in many cases. So if you prove that the input lists won't be too long there is a bunch of cases where you don't need to do this.

A good application of that is pagination: make sure that you always work with pages that are of a reasonable size and this way you won't need loops that could run forever. Always think your code so it only works on a finite amount of data and let tools that were made for that handle infinity (like your DB engine).

3. Avoid heap memory allocation.

Original rule — Do not use dynamic memory allocation after initialization.

This makes of course no sense in interpreted languages where literally everything is allocated dynamically. But this doesn't mean that the rule does not apply to them. The core idea of the rule is that, beyond the tedious memory management techniques that you have to use in C, it's also very important to be able to fixate an upper bound in the memory consumption of your program.

So for interpreted languages it means that when you write your code, you should be able to know that given any accepted input the memory consumption won't go beyond a certain point.

While this can be hard to prove in an absolute manner, there is good clues and principles that you can follow. To be more specific and to repeat the previous sections, pagination is an essential technique. If you only work with pages and that you know that the content of each page is limited (DB fields have limited length and so on) then it's quite easy to prove that at least the data coming from those pages can be contained within an upper bound.

So once again if you use pages you'll know that the memory allocated to each page is reasonable and that your system can handle it.

4. Restrict functions to a single printed page.

Original rule — No function should be longer than what can be printed on a single sheet of paper in a standard reference format with one line per statement and one line per declaration. Typically, this means no more than about 60 lines of code per function.

This is about two different things.

First, the human brain can only fully understand so much logic and the symbolic page looks about right. While this estimation is totally arbitrary you'll find that you can easily organize your code into functions of about that size or smaller and that you can easily understand those functions. Nobody likes to land on a 1000-lines function that seems to do a gazillion things at the same time. We've all been there and we know it should not happen.

Second, when the function is small — or rather as small as possible — then you can worry about giving this function the least possible power. Make it work on the smallest unit of data and let it be a super simple algorithm. It will de-couple your code and make it more maintainable.

And let me emphasis on the arbitrary aspect of this rule. It works for the very reason that it is arbitrary. Someone decided that they don't want to see a function longer than a page because it's not nice to work with if it is any longer. And they've also noticed that it is doable. At first I rejected this rule but more than a decade later I must say that if you just follow either of the goals mentioned above then your code will always fit in a page of paper. So yes, it's a good rule.

5. Use a minimum of two runtime assertions per function.

Original rule — The assertion density of the code should average to a minimum of two assertions per function. Assertions are used to check for anomalous conditions that should never happen in real-life executions. Assertions must always be side-effect free and should be defined as Boolean tests. When an assertion fails, an explicit recovery action must be taken, e.g., by returning an error condition to the caller of the function that executes the failing assertion. Any assertion for which a static checking tool can prove that it can never fail or never hold violates this rule. (I.e., it is not possible to satisfy the rule by adding unhelpful "assert(true)" statements.)

That one is tricky because you need to understand what would count as an assertion.

In the original rules, assertions are consider to be a boolean test done to verify "pre- and post- conditions of functions, parameter values, return values of functions, and loop-invariants". If the test fails then the function must do something about it, typically returning an error code.

In the context of C or Go it is mostly as simple as this. In the context of almost every other language it means raising an exception. And depending on the language, a lot of those assertions are made automatically.

To give Python as an example, you could do this:

assert "foo" in bar
do_something(bar["foo"])

But why bother when the fact of doing this will also raise an exception?

do_something(bar["foo"])

For me it's always very tempting to make as if the input value was always right by falling back to defaults when the input is crap. But that's usually not helpful. Instead, you should let your code fail as much as possible and use an exception reporting tool (I personally love Sentry but there is plenty out there). This way you'll know what goes wrong and you'll be able to fix your code.

Of course, this means that your code will fail at runtime. But it's all right! Runtime is not production time. If you test your application extensively before sending it to production, this will allow you to see most of the bugs. Then your real users will also encounter some bugs, but you will also be informed of them, instead of things failing silently.

As a side-note, if you don't have control over the input, like if you're doing an API by example, it's not always a good idea to fail. Raise an exception on incorrect input and you'll get an error 500 which is not really a good way to communicate bad input (since it would rather be something in the range of the 4xx status codes). In that case you need to properly validate the input before hand. However depending on who's using the code you might or might not want to report the exceptions. A few examples:

An external tool calls your API. In that case you want to report exceptions because you want to know if the external tool is going sideways.
Another of your services calls your API. In that case you also want to report exceptions as it's yourself doing things wrong.
The general public calls your API. In that case you probably don't want to receive an email every time that someone does something wrong.

In short it's all about knowing about the failures that you will find interesting to improve your code stability.

6. Restrict the scope of data to the smallest possible.

Original rule — Data objects must be declared at the smallest possible level of scope.

In short, don't use global variables. Keep your data hidden within the app and make it so that different parts of the code can't interfere with each other.

You can hide your data in classes, modules, second-order functions, etc.

One thing though is that when you're doing unit testing then you'll notice that this sometimes backfires to you because you want to set that data manually just for the test. This might mean that you need to hide your data away but keep a way to change it which you conventionally won't use. That's the famous _name in Python or private in other languages (which can still be accessed using reflection).

7. Check the return value of all non-void functions, or cast to void to indicate the return value is useless.

Original rule — The return value of non-void functions must be checked by each calling function, and the validity of parameters must be checked inside each function.

In C, the mostly-used way of indicating an error is by the return value of the corresponding function (or by reference into an error variable). However, with most interpreted languages it's simply not the case since errors are indicated by an exception. Even PHP 7 improved that (even if you still get warnings printed as HTML in the middle of your JSON if you do something non-fatal).

So in truth this rule is: let errors bubble up until you can handle them (by recovering and/or logging the error). In languages that have exceptions it's pretty simple to do, simply don't catch the exceptions until you can handle them properly.

See it another way: don't catch exceptions too early and don't silently discard them. Exceptions are meant to crash your code if needs to be and the proper way to deal with exceptions is to report them and fix the bug. Especially in web development where an exception will just result in a 500 response code without dramatically crashing the whole front-end.

8. Use the preprocessor sparingly.

Original rule — The use of the preprocessor must be limited to the inclusion of header files and simple macro definitions. Token pasting, variable argument lists (ellipses), and recursive macro calls are not allowed. All macros must expand into complete syntactic units. The use of conditional compilation directives is often also dubious, but cannot always be avoided. This means that there should rarely be justification for more than one or two conditional compilation directives even in large software development efforts, beyond the standard boilerplate that avoids multiple inclusion of the same header file. Each such use should be flagged by a tool-based checker and
justified in the code.

In C code, the macros are a particularly efficient way to hide the mess. They allow you to generate C code, mostly like you would write a HTML template. It's easy to understand that it's going to be used sideways and actually you can have a look at the IOCCC contestants which usually make a very heavy use of C macros to generate totally unreadable code.

However C (and C++) is mostly the only mainstream language making use of this, so how would you translate this into other languages? Did we get rid of the problem? Does compiling code into other code that will then be executed sound familiar to someone?

Yes, I'm talking about the huge pile of things we put in our Webpack configurations.

The initial rule recognizes the need for macros but asks that they are limited to "simple macro definitions". What is the "simple macro" of Webpack? What is the good transpiler and the bad transpiler?

My rationale is simple:

Keep the stack as small as possible. The less transpilers you have the less complexity you need to handle.
Stay as mainstream as possible. By example I always use Webpack to transpile my JS/CSS, even in Python or PHP projects. Then I use a simple wrapper around a manifest file to get the right file paths on the server side. This allows me to stay compatible with the rest of the JS world without having to write more than a simple wrapper. Another way to put it is: stay away from things like Django Pipeline.
Stay as close as possible from the real thing. Using ES6+ is nice because it's a superset of previous JS versions, so you can see transpiling as a simple layer of compatibility. I wouldn't recommend however to transpile Dart or Python or anything like that into JS.
Only do it if it brings an actual value for your daily work. By example, CoffeeScript is just an obfuscated version of JavaScript so it's probably not worth the pain, while something like Stylus/LESS/Sass bring variables and mixins to CSS will help you a lot to maintain CSS code.

You're the judge of good transpilers for your projects. Just don't clutter yourself with useless tools that are not worth your time.

9. Limit pointer use to a single dereference, and do not use function pointers.

Original rule — The use of pointers should be restricted. Specifically, no more than one level of dereferencing is allowed. Pointer dereference operations may not be hidden in macro definitions or inside typedef declarations. Function pointers are not permitted.

Anybody who's done C beyond the basic examples will know the headache of pointers. It's like inception but with computer memory, you don't really know how deep you should follow the pointers.

The need for that is, by example, the qsort() function. You want to be able to sort any type of data but without knowing anything on them before compiling. Have a look at the signature:

void qsort( void *ptr, size_t count, size_t size,
            int (*comp)(const void *, const void *) );

It's one if the most frighteningly unsafe things you'll ever see in a standard library documentation. Yet, it allows the standard library to sort any kind of data, which other more modern language still have a little bit awkward solutions.

But of course when you open the gate for this kind of things, you open the gate to any kind of pointer madness. And as you know, when a gate is open then people will go through it. Hence this rule for C.

However what about our case of interpreted languages? We will first cover why references are bad and then we will explain how to accomplish the initial intent of writing generic code.

Don't use references

Pointers don't exist but some ancient and obscure languages like PHP still thought that it would be a good idea to have it. However, most of the other languages will only use a strategy named call-by-sharing. The idea is — very quickly — that instead of passing a reference you will pass objects that can modify themselves.

The core point against references is that, beyond being memory unsafe and crazy in C, they also produce side-effects. By example, in PHP:

function read($source, &$n) {
    $content = // some way to get the content
    $n = // some way to get the read length

    return $content;
}

$n = 0;
$content = read("foo", $n);

print($n);

That's a common, C-inspired, use-case for references. However, what you really want to do in this case is

function read($source) {
    $content = // some way to get the content
    $n = // some way to get the read length

    return [$content, $n];
}

list($content, $n) = read("foo");

print($n);

All you need is two return values instead of one. You can also return data objects which can fit any information you want them to fit and also evolve in the future without breaking existing code.

And all of this without affecting the scope of the calling function, which is rather nice.

Another safety point though is when you're modifying an object then you're potentially affecting the other users of that object. That's by example a common pitfall of Moment.js. Let's see.

function add(obj, attr, value) {
    obj[attr] = (obj[attr] || 0) + value;
    return obj;
}

const a = {foo: 1};
const b = add(a, "foo", 1);

console.log(a.foo); // 2
console.log(b.foo); // 2

On the other hand you can do:

function add(obj, attr, value) {
    const patch = {};
    patch[attr] = (obj[attr] || 0) + value;
    return Object.assign({}, obj, patch);
}

const a = {foo: 1};
const b = add(a, "foo", 1);

console.log(a.foo); // 1
console.log(b.foo); // 2

Both a and b stay distinct objects with distinct values because the add() function did a copy of a before returning it.

Let's conclude this already-too-long section with the final form of the rule:

Don't mutate your arguments unless the explicit goal of your function is to mutate your arguments. If you do so, do it by sharing and not by reference.

That would by example be the no-param-reassign rule in ESLint as well as the Object.freeze() method. Or in Python you can use a NamedTuple in many cases.

Note on performance: if you change the size of an object then the underlying process will basically be to allocate a new contiguous region of memory for it and then copy it. For this reason, a mutation is often a copy anyways, so don't worry about copying your objects.

Leverage the weak-ish dynamic typing

Now that we closed the crazy door of references, we still need to write generic code if we want to stay DRY.

The good news is that while compiled languages are bound by the rules of physics and the way computers work, interpreted languages can have the luxury of putting a lot of additional support logic on top of that.

Specifically, they mostly rely on duck typing. Of course you can add some level of static type checking like TypeScript, Python's type hints or PHP's type declarations. Using the wisdom of other rules:

Rule 5 — Make many assertions. Expecting something from an object which doesn't actually have it will raise an exception, which you can catch and report.
Rule 10 — No warnings allowed (explained hereafter). Using the various type checking mechanisms you can rely on a static analyzer to help you spot errors that would arise at runtime.

Those two rules will protect you from writing dangerous generic code. Which would result in the following rule

You can write generic code as long as you use as many tools as possible to catch mistakes, and especially you need to follow rules 5 and 10.

10. Compile with all possible warnings active; all warnings should then be addressed before release of the software.

The initial full rule is:

All code must be compiled, from the first day of development, with allcompiler warnings enabled at the compiler’s most pedantic setting. All code must compile with these setting without any warnings. All code must be checked daily with at least one, but preferably more than one, state-of-the-art static source code analyzer and should pass the analyses with zero warnings.

Of course, interpreted code is not necessarily compiled so it's not about the compiler warnings per se but rather about getting the warnings.

There is fortunately a great amount of warning sources out there:

All the JetBrains IDEs are pretty awesome at finding out issues in your code. Recently, those IDE taught me a lot of patterns in different languages. That's really the main reason why I prefer something like this to a simplistic code editor: the warnings are very smart and helpful.
Linters for all the languages
- JavaScript — eslint with a set of rules AirBnB maybe?
- Python — There is a bunch of tools that will help you either with type checking or with code smells detection
Automated code review tools like SonarQube
Spell checkers are also surprisingly important because they will allow you to sniff out typos regardless of type analysis or any complicated static code analysis. It's a really efficient way to not lose hours because you typed reuslts instead of results.

The main thing about warnings is that you must train your brain to see them. A single warning in the IDE will drive me mad while on the other hand I know people that just won't see them.

A final point on warnings is that on the contrary of compiled languages, warnings here are not always 100% certain. They are more like 95% certain and sometimes it's just an IDE bug. In that case, you should explicitly disable the warning and if possible give a small explanation of why you're sure that you don't need to apply this warning. However, think well before doing so because usually the IDE is right.

Key takeaways

The long discussion above tells us that those 10 rules were made for C and while you can use there philosophy in interpreted languages you can't really translate them into 10 other rules directly. Let's make our new power of 10 + 2 rules for interpreted languages.

Rule 1 — Don't use goto, rationalize the use of continue and break, don't use switch.
Rule 2 — Prove that your problem can never create runaway code.
Rule 3 — To do so, limit the size of it. Usually using pagination, map/reduce, chunking, etc.
Rule 4 — Make code that fits in your head. If it fits in a page, it fits in your head.
Rule 5 — Check that things are right. Fail when wrong. Monitor failures. See rule 7.
Rule 6 — Don't use global-ish variables. Store data in the smallest possible scope.
Rule 7 — Let exceptions bubble up until you properly recover and/or report them.
Rule 8 — If you use transpilers, make sure that they solve more problems than they bring
Rule 9.1 — Don't use references even if your language supports it
Rule 9.2 — Copy arguments instead of mutating them, unless it's the explicit purpose of the function
Rule 9.3 — Use as many type-safety features as you can
Rule 10 — Use several linters and tools to analyze your code. No warning shall be ignored.

And if you take a step back, all of those rules could be summed up in one rule to rule them all.

Your computer, your RAM, your hard drive even your brain are bound by limits. You need to cut your problems, code and data into small boxes that will fit your computer, RAM, hard drive and brain. And that will fit together.

— ~~Morpheus~~ Me

I consider that to be the core rule of programming and I apply it as an universal rationale to everything I do which is computer-related.

Top comments (46)

Juan Carlos • Jul 20 '19

You got some things wrong about it.

switch gets compiled to if usually, being compiled to machine code or bytecode but still.

About the assertions it refers to assert whatever the language is, and is also referring to Design by Contract.

About the preprocessor is referring to Macros and Metaprogramming, not about Transpilers.

I prefer a 100x100 maximum, 100 line length per 100 lines per function.

Rémy 🤖 • Jul 21 '19

Those are interesting points, to reply in order:

The way switches are compiled is not-so-relevant to this point, because it is more about human perception of code flow than technical reasons. But I've stated it in other comments, the switch ban is really not my strongest take.
In the original paper, it refers very specifically to macros returning a boolean value which can be used in a if that can be used to exit the flow and return an error code, which is then taken care of by rule 7 (emit exceptions and let them bubble up).
That's also the understanding that I've had for a long time about the preprocessor rule and in fact I actually wrote a dual-axis section speaking about both transpiling and metaprogramming. However, I thought about metaprogramming in terms of harm. Yes you can do crazy stuff with it but it's so complicated that I never see anybody doing anything bad with it. The only use I see commonly is Django's models, forms and serializers which are actually awesome. I figured that C macros are easy and nasty while metaprogramming is usually hard but used sparringly and for good reasons.
Regarding the size, that looks reasonable as well. The JPL 60-lines feels right to me since most of my functions are much shorter anyways. Regarding the width, I guess there is some language-specific parameters to account for but the nice people behind black did some research and found that 88 is the best. Which is fine by me, it allows to sit two files side by side on a laptop screen at a readable font size. But honestly, as long as the numbers stay consistent project-wise, knock yourself out.

Morgen Peschke • Jul 21 '19

IMO, the ability to display two bits of code side by side is by far the most useful outcome of restricting line lengths.

Vlastimil Pospichal • Jul 21 '19 • Edited

60x80 is enough for a class.

Andrew Harpin • Jul 22 '19

I respect Remy wishes to avoid the switch compilation, there are typically 2 outputs from the compiler, depending on the code and optimization settings:

If else as mentioned, this is typically used when there are few options in the switch statement or there is a large Delta between the switch enumerals.
Jump table if there are a large number of cases and they are mostly sequential the assembler can be a jump table where you have a start and the enumeral is an offset, this is more performance efficient, but not always space efficient and this is where your optimization settings come in.

There are edge cases which result in some quirky behaviour, but these are typically compiler specific

hidden_dude • Jul 22 '19 • Edited

These rules don't make sense for all types of software. NASA is working with embedded systems that are limited and don't use GCs. Also, I believe they aren't OO.

An anti-pattern in GC languages is to hold on to memory for a long time. If you aren't allocating and (implicitly) freeing objects rapidly in modern OO languages, then you're holding on to many objects for a long time. And that will cause the GC to work much harder in some cases. GCs today are optimized to get rid of short-lived objects, and perform poorly when too much data is escalated to become long lived.

Of course, the principle that you should limit your code to use O(1) RAM when possible is a good one. Because if you don't you'll get out of memory errors. Of course, the impact of that showing up in a web app is not the same as your robot crashing right before it lands on Mars (thus causing a physical crash and a loss of 1 billion dollars and 20 years of work).

My point is software is different. And rules need to reflect those differences. NASA for example is not known to be very good with staying within timelines and budgets. For commercial software that is probably much more important that avoiding an improbably crash now and then.

Andrew Harpin • Jul 22 '19

These rules are taken from MISRA, which is typically used for safety critical embedded software.

Yes they aren't always applicable for all software and languages, but they are good things to consider.

Rémy 🤖 • Jul 24 '19

All I'm saying, in essence, is that you need to be accountable for the resources that your program uses. Otherwise you risk blowing things. More than once I've seen a dev app blowing up when reaching production volumes and I'm certainly not talking about Google-scale.

So the advice is more a O(whatever(n)) complexity in RAM and time but with a bound on n and thus a bound on whatever(n).

If you work on data, work by pages. If you work on a stream, work on N items at once. If you do big data, do Map/Reduce.

Also NASA was notoriously on time to put a man on the Moon, so I'm guessing that their methodology to push the boundaries of science and Humankind won't be a hindrance for more conventional tasks. At least in my case, these rules help me on a daily basis to deliver quality code on time.

But yeah first time I read that I was like "bollocks nobody needs that". I guess denial is a first step :)

Mike Schinkel • Aug 6 '19

NASA also had a huge budget relative to the time period when they were putting a man on the moon. Something that most conventional tasks do not have the luxury of having.

Rémy 🤖 • Aug 6 '19

Funnily enough there was no budget allocated to software on the Apollo program.

But you need the budget to prove everything, not to apply things in best effort mode. In my case, applying those rules saves time and not the opposite.

Mike Schinkel • Aug 7 '19

"there was no budget allocated to software on the Apollo program."

In those days software was an afterthought for bean counters. Back then it was just rolled into "Development & Operations." And with $28.7 billion in inflation adjusted dollars for that line item, let's just say they had enough money to get it right. Which is rare for software projects today.

"applying those rules saves time and not the opposite"

Let me first say that I wrote a long comment that was subsequently eaten by my browser's interaction with this website and its lack of maintain a cookie-based copy of comments in progress. And that comment started out by saying that your article was great and that it had a lot of really good advice. Unfortunately I was weary of typing it in again so sadly it was lost to the ether.

But your article also had a few points of opinion, the nature of which is impossible to prove is time saving in its application. To assert otherwise would just be hubristic and would illustrate nothing more than confirmation bias. #fwiw

Will Vincent • Jul 24 '19

Many of these strike me as only relevant if you're passing off code to someone who is either very junior, or not familiar with the language the code is written in.

The switch argument being the best example of this.

if (foo === 'bar') {
  // do something
} else if (foo === 'baz') {
  // do something else
} else {
  // do a default thing
}

Really isn't any more readable than

switch(foo) {
  case 'bar':
    // do something
    break;
  case 'baz':
    // do something else
    break;
  default:
    // do a default thing
}

To anybody with a modicum of experience, or half a brain :D

As for the recursion argument, one could easily limit recursion depth by keeping track of how deeply you've recursed, and stopping when you hit the depth limit. Sure iteration might seem easier, until you run across a nested tree you need to parse that has varying levels of depth on each branch, and you want to write efficient code to parse it. Recursion definitely has it's place -- as do virtually every other thing you're arguing against.

Rémy 🤖 • Jul 24 '19

If you need half a brain to understand and check a switch statement that's already half a brain you're not spending on other things.

It's like these boxes full of things that "might be useful later" that some people keep in their garage, in the hope that one day this piece of handheld barcode scanner will yield any utility. But as you might know the day rarely comes and in the meantime it's holding a LOT of space.

So regarding the switch, two things:

Either you use it as a if/else if/else, in which case the switch syntax itself is simply useless because totally identical to the other one
Either you use it for its weird control flow characteristics, in which case it's mind-bending and dangerously close to goto

Basically, either it's bad either it's useless. So that's not something I want to bother about.

Will Vincent • Jul 25 '19 • Edited

That's a hell of a twist of my words. Never did I say it takes half your brain to understand, but that someone with half a brain CAN understand..

Let me put it another way since you clearly didn't understand. Switch statements are stupid simple to grasp, unless the person looking at it is a complete idiot -- or, as I also said, completely unfamiliar with the syntax of the language, in which case they oughtn't be poking around the code in the first place.

Obviously we're not going to agree, and that's fine, I don't have to work with you, so your desire to eliminate perfectly useful easily understood elements from code doesn't affect me :)

jjtriff • Aug 7 '19

I agree, except on the being cruel part.

Switches are as readable as ifs if not more.

Recursion has its very needed place.

Jacek Złydach • Jul 23 '19

Wouldn't overdo it. As others pointed out, those guidelines are tactical-level rules for working with a GC-free, memory-unsafe language in context of embedded realtime systems. This environment requires you to have total control over what the code does, how much time it spends on it, and how much memory it uses. Many of the points are making a trade-off for that control, against increased code complexity.

Some notes:

Point 1b - avoiding recursion. I'd chill out with this a bit; recursion ain't scary once you familiarize itself with it. It's rarely the best thing to do (unless you work with trees a lot), but for some problems, a recursive solution is the cleanest, most readable one. If your language supports Tail Call Optimization, it may even (in some cases) be as fast as iteration and not pose a risk of stack overflow.

Point 3 - avoiding heap allocations - is very much applicable to dynamic languages for performance reasons. Dynamic allocation costs performance (not as much as in C/C++ if your language runtime optimizes memory management for allocation, but still).

You can learn writing so-called "non-consing", i.e. not constructing, not allocating code, but it's tricky - such code may become less readable than the "regular" alternative, and definitely needs to be isolated because it can corrupt your data if references leak. The trick is to learn your language and standard library (and libraries you use), and pay attention to operations that modify its arguments, instead of returning a new value. They're sometimes called "destructive operations".

Consider e.g. following REPL session in Common Lisp:

CL-USER> (remove-if #'evenp (list 1 2 3 4 5 6 7))
(1 3 5 7)
CL-USER> (delete-if #'evenp (list 1 2 3 4 5 6 7))
(1 3 5 7)

The results are seemingly the same; the difference is that remove-if returns a fresh list, while delete-if modifies its list argument. In non-performance-critical places, you should prefer to use the remove-if variant, because it's safer, and GC can get rid of unused data just fine. delete-if is faster, because it only rebinds pointers in a list to drop some nodes, but if that (list 1 2 3 4 5 6 7) was passed in via reference, and someone else would hold that reference, that someone would discover the data changed under them.

To write non-allocating code, you should also use arrays more - they can be allocated once to required size and reused without reallocation. It's especially handy if your language supports unboxed primitives - for things like numbers, you can then rewrite your critical code to perform operations "in place", using preallocated arrays as temporary buffers for calculations.

Point 9b is subtly uncovered (avoid function pointers). In dynamic languages, it would translate to "avoid using functions as values" - avoid higher-order functions, avoid lambda expressions, etc. Applying it would be a huge loss to code clarity at no real win in context of web development (or really any non-embedded development).

Paddy3118 • Jul 22 '19 • Edited

Avoid complex flow constructs,

Always good to avoid unnecessary complexity, but I don't agree with arbitrarily avoiding recursion.

There are recusions whose depths are easily calculated, and whose code is easily reasoned about. Remember, the speed of a solution is often better governed/adjusted at the higher level. (millisecond shaving by using iteration instead of recursion vs saving seconds by changing algorithm and/or data structure).

MATHIAS D • Jul 20 '19

The rule n°1 is very strange, if switch & continue exists it's for a reason, using them for what they was thought for is the best way of coding; a top layer for conditions. Yes you can instead use if-else because of .. security ?😅 But maybe you can use if only & not else because of security too ? 🥴
The 9.1 do not use reference, ouch, why ? same as above use them for what they are best for, if you only use copy variables what about scalability for very large element like some json parsed result ? Furthermore javascript for eg use reference only for object elements ..😄

Benjamin • Jul 20 '19

The fact that a feature exists in a language doesn't mean it's a feature worth using. These languages are designed by mortals and often inherit patterns and ideas from previous (and flawed) languages.

JavaScript, while beloved, was originally designed in a hurry and is famous for having "bad parts" which experienced developers not only avoid using but purposefully don't teach to newbies.

You say we should use these features for what they are best for. I agree that we should certainly not abuse any language feature, but the point here is that there are almost always better ways to solve the problems these features address. By "better" I mean "easier for you and others to understand next year" and "offering fewer places for bugs to hide."

Rémy 🤖 • Jul 20 '19

Excellent questions, the article was already too long to dive into this.

Regarding the switch issue, it's because the syntax is like this:

$foo = "1";

switch ($foo) {
    case "1":
        print("1");
    case "2":
        print("2");
    case "3":
        print("3");
}

// Prints "123"

Were it like this

$foo = "1";

switch ($foo) {
    case ("1") {
        print("1");
    } case ("2") {
        print("2");
    } case ("3") {
        print("3");
    }
}

// Prints "1"

I wouldn't mind, but unfortunately it's not so the safest option is still a series of if/elseif/else.

Regarding your point on references, basically what they do is not just pass an object (which is a shared point of anchorage and can be used to retrieve subsequent data) but also make in sort that if you change the value of that object in the called function then the change bubbles up to the calling function (see the example in the article). Fortunately this feature just does not exist in Javascript so you don't have to worry about it :)

Mike Schinkel • Aug 6 '19 • Edited

Based on your dislike of switches because they do not have intermediate ending braces, my guess is you really hate Python? #justasking

Rémy 🤖 • Aug 6 '19

Oh no I love Python, actually it makes these rules easier to apply and also it does not have switch.

But it's about the confusing control flow rather than the syntax (which I really don't care about). Forget a break and you're toast. Stack cases and you're confusing anybody reading your code.

Mike Schinkel • Aug 7 '19 • Edited

Ironically just a few weeks ago I read an article by Dave Cheney — the bard of Go — entitled "Clear is better than clever" that advocates for using switch instead of if / else.

His article does mention that the switch in Go does not fall-through by default and your objection to the switch — which falls-through by default (a language decision I lament) — seems to be solely based on the potential to omit a break. But his article does give other reasons for switch to be superior to if / else none of which are mentioned in your post.

Personally, I think a belief that in-all-cases either if or switch is superior to the other is simply allowing oneself to overindulge in one's own unsubstantiated opinion.

(Notice I left off else; I concur with @DoMiNeLa10 and agree with the advice of Mat Ryer: Avoid else, when you can.)

Earlier this week I refactored some if / else code to be a switch statement in PHP. I did so thoughtfully and with the sole goal of adding clarity to the code someone else wrote who is no longer involved. The code was used to take action based on number of path segments in a URL. It had one if and three (3) else ifs where each condition tested count($parts) for equality with 1, 2, 3 or 4.

I needed to add logic for 5 so I changed from if / else if / else to a using a switch so the code would not need to call count($parts) five (5) times (nor need to set a temp var) and so that the logic was clearly spelled out that it was working on a use-case that had 1 of 5 options.

Further, when formatted the code was much more readable, the breaks were obvious, and the nature of the code meant it was unlikely someone would ever add both a 6 and a 7 and forget to include a break on the 6.

My point with this anecdote is that almost all coding "rules" really should be applied with an analysis of the use-case at hand and not be considered as a pedantic absolute. Clinging to dogma really does not benefit anyone.

Unless of course we are dealing with junior programmers, and then all bets are off. :-D

Michiel Hendriks • Jul 22 '19

The generations that came out after goto was considered harmful did indeed avoid using it. We're at the stage where we're debating if continue is goto and thus should be banned.

This not what Dijkstra was talking about. Dijkstra wrote that essay before we had structured programming. Where you did not call a function or method. But you had to jump to places. Make sure you prepared the stack correctly, and hope the target location did not change.

None of the "modern" languages support these types of jumps. The continue or break to a specific label is not the same, as it is still strictly scoped.

Rémy 🤖 • Jul 24 '19

I'm not sure if you've already used goto but the few times where I thought that Dijkstra was wrong it felt like having main brains smashed in by a slice of lemon wrapped round a large gold brick. Whichever the language.

Michiel Hendriks • Jul 24 '19

I have programmed in Basic. I have written code with line numbers. I have messed up that code.

Jumping to the wrong line was so easy, and so difficult to figure out.

Rémy 🤖 • Jul 24 '19

Those were simpler times 😢

Rémy 🤖 • Jul 20 '19

Definitely, control structures are a debate that is faaar from over and also highly specific to each language. I've put here general observations but far from me to be definitive on that specific matter. It's more guiding ideas.

Regarding the variable re-assignation, I do it as little as possible. But sometimes in JS with block-scoping you need to use a let here and there to assign a value from inside a condition. Also there is the case of counters for loop iterations. If we were to make a rule like that it would require a lot of fiddling I think.

Morgen Peschke • Jul 21 '19

Early return might not be language specific, but they are paradigm specific.

While returning early can make sense in a statement-oriented language, they aren't nearly as popular in expression-oriented languages.

In at least one (Scala), an explicit return of any kind, but especially an early return, is explicitly an anti-pattern.

Of course, most of these languages have pattern matching, which is a great alternate to both switch and complex if...else blocks.

Riccardo Bernardini • Jul 25 '24

About the switch rule. This is very language dependent, not every language has the fall through behavior of C. For example, in Octave/Matlab, Ruby and Ada there is no need for a break. I guess that the choice of using the fall through was to avoid the need of a more complex syntax for multiple cases for the same branch (but this is just a wild guess of mine).

Actually, in Ada I prefer to use case (equivalent to C switch) whenever it is possible since the language requires that you specify a branch for every possible choice, that is, you cannot leave cases unexpressed (but you can always use the default case when other =>). This is especially useful if you are "switching" using enumerative types. For example, suppose you add a new value to an enumerative type, thanks to this rule you cannot forget updating all the cases that need to be updated.

View full discussion (46 comments)