Don’t pay the for-loop tax

Dan Homola on June 03, 2017

Note: this post was originally published on my Medium profile Once, when doing code review on a TypeScript project at my work, I came across sever... [Read Full]
markdown guide
 

Ok, I'm gonna be the old fart here.

  1. If anything has a "tax," it's using map/reduce/filter, because these require defining a function (frequently an anonymous one) to perform the unit work, then that function is called for each iteration. So for N loops, you get at least N+1 additional function calls, with all of the associated overhead, compared to just having some code in a loop.

  2. Calling map/reduce/filter doesn't eliminate the for loop, it merely moves it. The implementation of these methods has... you guessed it... a for loop.

  3. Loops can break or return early. While findIndex() helps with some situations you'd need that for, other use cases don't have an equivalent.

  4. Loops can modify their iterator variable, allowing them to skip over values (or even go back). Granted, this is an edge case.

I suppose the idea is that loops somehow require a mental tax? I've been using "for" and its siblings for over 30 years in just about every programming language, so they're pretty natural for me.

I do use these abstractions, and enjoy the syntactic sugar where it makes sense. But avoiding a loop structure should be a choice made with a full understanding of the pros and cons, it shouldn't be some sort of default because "indentation is a code smell" or similar hair-brained rules.

 

Indeed, the performance penalties of functional programming methods must be known, but you're misunderstanding what's the "tax" here: it's indeed a "mental" tax, which led us to use old patterns because they were the only ones available in older languages, and they're still available anyway, while there could be other, more expressive ways to do the same things.

More expressivity means the code is more readable and maintainable, because it gives you more hints about the purpose of the code. If that implies writing a separate function for the iteration, so be it, because it's exactly the function's name that brings additional value.

Sure, you can always write a comment. But comments should explain why, not what: writing comments is tedious, so let the code speak for itself as much as possible.

 

Thanks for the reply. I'll try to comment on your points:

Ad 1. The tax was originally meant as a real monetary tax (or a fine more precisely) and it was meant as a joke. There were no performance connotations to the word in this context. As others have stated in this thread: if you need performance go for the loops.

Ad 2. I agree, but it hides it from you, and I believe it is a good thing. When reading someone else's code, I'd much rather see a map than some loop because with the loop I have to infer the what from the how whereas with map the what is already written for me.

Ad 3. So can find, includes and some others. Granted these are not as widely supported, they can be polyfilled.

Ad 4. I think that is actually a disadvantage, just because you can do something, doesn't mean you should (excluding the performance optimisation scenarios).

I totally agree with you on that any choice should be made with all the implications in mind. The approaches recommended in the article are just that – recommendations :)

 

You pretty much covered what I was going to comment here. I'm honestly a little skeptical of some of the mantras surrounding Javascript at the moment. There are some ideals being tossed around at the moment that might lean a bit too hard on perceived "readability advantages" and straightforwardness. In other words, the performance implications of some of the current proposed best practices make me a tad uncertain regarding their longevity.

edit: I'm not saying the giving examples are bad either, I feel like this piece actually gave great examples. I'm speaking towards the bigger trend I've been noticing.

 

So for N loops, you get at least N+1 additional function calls, with all of the associated overhead, compared to just having some code in a loop.

I may be wrong, but I would have assumed that modern JS interpreters can optimize this away. I would also expect it to be faster to call the loop they implement in the underlying language.

Calling map/reduce/filter doesn't eliminate the for loop, it merely moves it. The implementation of these methods has... you guessed it... a for loop.

If you're iterating over an array, but not necessarily for other iterable structures. A nice thing about using the iterator functions is that you don't have to know / care what kind of collection you're iterating over. Using the abstractions means you don't have to embed iteration knowledge at every location in your code that you want to iterate. Your code doesn't change just because the details of how to iterate change.

Loops can break or return early. While findIndex() helps with some situations you'd need that for, other use cases don't have an equivalent.

IIRC, there's an interface for defining iterators that allows them to return early.

Loops can modify their iterator variable, allowing them to skip over values (or even go back). Granted, this is an edge case.

It's a good point, if you can't do that, then you have to retain state somehow (probably via a stack). Although, if the logic for how to prepare for the next iteration were that complex, I would probably choose a while loop over a for loop (or perform the update section of the for loop in its body).

 

This expands the array and may cause it to reallocate in memory being slow

That's surely correct, but if you're interested in performance you should definitely use a for loop, as any solution based on callbacks, while more expressive, is one or two orders of magnitude slower, for small and large arrays:

console.time("Array.push");
var array = [];
for (var i = 0; i < 1e7; i++) array[i] = i;
console.timeEnd("Array.push");
// Array.push: 220ms

console.time("new Array.push");
var array = new Array(1e7);
for (var i = 0; i < 1e7; i++) array[i] = i;
console.timeEnd("new Array.push");
// new Array.push: 47ms

console.time("Array.map");
var array = [ ...Array(1e7).keys() ].map((_, i) => i);
console.timeEnd("Array.map");
// Array.map: 1209ms

But let's not forget that we also have for...in and, above all, for...of in JavaScript, which bring much nicer semantics.

These are my main cases of why I use for loops in JavaScript:

  1. I'm releasing an open source package which could be used in performance-intensive tasks;
  2. my code needs to be transpiled and things like for...of are polyfilled with an unbearable amount of code (in the case Babel can't infer if the collection is actually an array or any other kind of iterable);
  3. because - ah, screw it - it's clear enough since it's a pattern everyone knows, while building an array with [ ...Array(n).keys() ].map(...) is quite obscure at the first glance.

The rest of the article is pretty much spot-on. Concepts that need to be reiterated.

 

A little late to the discussion, but generating the array declaratively would probably go a little faster if we make Array think it's converting an "array-like" structure and provide a map function.

console.time("Array.from");
var array = Array.from({length: 1e7}, (_, i) => i);
console.timeEnd("Array.from");
 

Thank you for your insightful response. I totally agree that in performance-critical applications loops are the way to go and that the for...of loops addresses some of the problems I have with simple for loop.

I just believe it is easier to get it right with map/reduce. It fits in the "Make it run, make it right, make it fast, make it small" workflow nicely.

 

It's not just a little faster to use the for loop - it's orders of magnitude faster. Why adopt an abstraction that reduces performance of a code segment by upwards of 80% (tested 86% slower on my desktop, 84% slower on iPad gen 5).

Like why adopt inefficient design patterns in the first place?

jsperf.com/old-man-yells-at-clouds...

 

My argument has always been that writing a for loop that takes each element in an array, transforms it, and adds the output to another array, is writing your own implementation of map.

map is a well understood function these days that everyone can read so don't keep re-implementing it. It's just more code for you to own and read, and there's a chance of bugs in your implementation. To put it another way, you should use the map function for the same reason you use any functions at all: to avoid repeating code.

The same goes for filter and reduce.

RE performance: Measure it in the context of your application. I doubt there is a difference in the vast majority of cases given engine optimisations, but you can always fall back to a for loop where you can prove it's worth doing.

 

The funny thing is, writing your own map function with a for loop is faster than the native one (a plain for loop is still faster because it doesn't have to deal with the function invokations)

 

The for(;;) construct is error-prone but I feel "going functional" in JavaScript is hardly a good solution.

Take for example the sum() function from the post. In Haskell, it would be defined this way:

sum' l = foldl (+) 0 l

Or here it is in Scheme:

(use-modules (srfi srfi-1))
(define (sum l)
    (fold + 0 l))

The sum' function is quite concise because the (+) operator is already a function. Even Python (where such a function would be frowned upon) could make it clearer:

import operator
from functools import reduce

def sum(l):
    return reduce(operator.add, l)

So, when we go to JavaScript, we got this very weird line (total, current) => total + current, where there is a total, and a current, and they are added up. Now I have a bit more to understand here. The for(;;) loop is brittle, but why not this?

const sum = (array) => {
    let result = 0;
    for (let i in array) {
        result += array[i];
    }
    return result;
}

Or, better, this?

const sum = (array) => {
    let result = 0;
    for (let v of array) {
        result += v;
    }
    return result;
}

There is a mutable variable but it is clearer what is going on. The JavaScript syntax is helping me here. Do you disagree?

I find trying to optimize for sum() functions problematic. sum() is a solved problem, whatever the solution you choose. But get another code where one calls, let us say, map() inside the function given to reduce(). It is surprising confusing. Given the trend to avoid naming functions in JS, it will become a mess.

Also, many functional patterns are helpful in functional languages, or specific situations, but not in all places. Many JS programmers write apply(compose(fn1, fn2.bind(arg)), value) because they think it is as good as f1 . (f2 arg) $ value. But is it? I don't think so. JavaScript didn't make this construction clear, as Haskell did.

Functional patterns make sense a lot of times. If you have an algorithm which can be reused except for one part, it will be great to pass the function (instead of, let us say, a Strategy pattern house-of-cards). Functions as values are a cornerstone for asynchronous code. I even like the each()/forEach() methods, so I do not have to discover if it uses a[0], a.get(0), a.item(0).

However, in general, a "more functional" JavaScript code will not be better than one that uses some loops. Trying to get better code by being "more functional" is very much like improving by being "more OO." And we know the result: how many Command pattern instances we found that should have been a function?

Now I wonder the same about lambdas that could be a for(of) block.

 

Thanks for the reply, it is an interesting view. While I understand that the for..of cycles remedy some of the problems I have with cycles, I still prefer the reduce.

I admit that for something so simple as a sum there is a little advantage, but this was chosen as an illustration for the problem exactly for the simplicity of the problem.

What I wholeheartedly agree with you on is that the all the operators (including the function call operator) in JavaScript should be functions as well.

 

Let's say you have to build a nightly batch process to process milions of entities (orders, invoices, bank transactions ... whatever). A typical approach would be to open a result set (or the equivalent structure in JS) to process the entities one by one and do partial commits. With a functional approach, will you read the milions of entities into an array just to use map/filter/reduce ? I hope your server has some Gigabytes of contiguous memory free, otherwise it wil fail miserably. The optimal solution will use a loop.

 

Well, I would use streams and their on method which IMO resembles the functional approach...

 

Do you mean in JS or Java? Because in Java doing it the functional way is extremely complex according to stackoverflow.com/questions/322092...

Do you have a JS example that reads rows from a relational DB without using loops?

I meant JavaScript. An example could be Google Lovefield (haven't tried it though).

By looking at the Readme, it seems like this API returns an array of rows. Which obviously is not acceptable when dealing with huge result sets.

}).then(function(results) {
// The SELECT query's Promise will return array of rows selected.
// If there were no rows, the array will be empty.

results.forEach(function(row) {
// Use column name to directly dereference the columns from a row.
console.log(row['description'], 'before', row['deadline']);
});
});

 

I think the only argument for using a a for-loop is to break out of the loop. I think pretty much everything else can be done using the functional approach.

 

While I agree that breaking out of a loop is a point for the loops, breaking out of a for rubs me the wrong way. I believe you should use a while (or do-while) loops and specify the condition appropriately. Breaks hamper readability imho and make the code even harder to reason about or even formally verify.

 

I've written plenty of loops that would look a lot worse if I had to write them without break statements. For example, this code to look for possible forward moves for a white rook on a chessboard:

while (row++ < BOARD_TOP) {
    if (is_white_piece(row, column))
        break;
    GENERATE_MOVE;
    if (is_black_piece(row, column))
        break;
}

The first check could obviously be moved into the loop condition, but I can't think of a good way to move the second check into the loop condition too.

Edit: Language is C.
Edit 2: GENERATE_MOVE call was in the wrong place, moved it outside of the second if.

If that's still JavaScript (could be even C or Java), it could be refactored in a more expressive way:

const piece = chessboardRows.find(pieceAtColumn(column));
if (piece && isWhite(piece)) {
    GENERATE_MOVE;
}

function pieceAtColumn(column) {
    return function(row) {
        // something like row[column]?
    };
}

Maybe it's more verbose, but also clearer.

This was C, I didn't have a find method on all the arrays.

Ah, crud, you're out of luck then XD

I think the aim of this article is mainly JavaScript or any other language that support methods for functional programming.

I guess you're stuck with classic loops unless some kind of abstraction helps you with that (in Java <8 it's still a mess anyway).

I discovered that I pasted the code wrong, so I've edited the post to correct that. In the original version I could have written a function to find the index in the array, but I can't do that for the correct version.

I agree the loop would look worse. However, in this case all you are doing is finding the first piece and if it's white calling a method, right?

So you could do something like

const pieceIndex = findFirstPiece(row);
if (pieceIndex > 0 && is_white_piece(pieceIndex , column))
  GENERATE_MOVE;

where the findFirstPiece would find the first row value that is either a white or a black piece, returning -1 if no such value was found.

Oops, I pasted the code wrong. What I wanted to do was generate every move from the current position of the rook up to a piece, and including the piece if the piece is a black piece. See updated post.

 

Indeed, but as a reminder there are methods that allow you to "break out" soon looping an array with FP. Namely, find and some are probably the most expressive. (Alas, find isn't supported in IE, not even 11, but OTOH is easily polyfill-able.)

 

for-loop-tax is accompanied with an array-method-tax.

I agree the first go-to should be array-methods, even with objects "for in" usually sucks compared to Object.keys(foo).map((propName,i)...) as you get i included.

The exceptions IMO are performance, break, continue and return.

It requires discretion because in some circumstances you could sometimes be writing spaghetti code when trying to be too functional where break, continue & return "sometimes" result it far less code and better maintainability (sometimes). But obviously vice versa as you've demonstrated.

 

Never rely on the order of keys of objects in JavaScript because by specification the interpretator can order them as it wishes

 

Not for generic prop names but if you previously "created" incremental prop names:

const obj = {thing-0: 'thing-a', another-thing-1: 'thing-b', 'something-2: 'thing-c'};
...
... obj[key.value.slice(0,-1) + i] // do something

There could also be situations where you need to accumulate based on the number of props rather than specifically the order of props.

 

I 100% agree with mantra here, but I'd like to add to the examples. For me, the functional style/indirection in these examples IS the value, and once you're doing things in a functional way you've already captured most of the technique's purpose. For me, whether you continue to use a declarative iterator or a for loop inside your abstraction is less important.

Here's an admittedly naive example of the price discounting that I think very clearly shows the difference between a for loop and an Array.forEach.

let prices = [5, 25, 8, 18];
let discount = 1 - 0.2;

// more imperative
for (let i = 0; i < prices.length; i++) {
    console.log(prices[i] * discount)
};

// more declarative
prices.forEach((price) => {
  console.log(price * discount);
});

Just reading these aloud to yourself shows the difference in clarity:

"For when i equals zero and i is less than the length of prices while incrementing i, console log prices at position i times discount."

vs.

"Prices: for each price, console log price times discount".

When moving from an imperative to declarative style, the code turns from near gibberish to an honestly comprehensible English sentence. I think that's a win for the whole team at every skill level, and the value this style is attempting to capture.

 

The only real reason yo use a loop over a functional approach is performance, unless you are working in c/c++ for a hpc program, functional should be safer. Anyway, if things keep going this way we'll all be doing Haskell in two years.

 

True. Haskell has been on my "to learn" list for a while but I can't find a project to learn it on.

 

I can't find a project to learn it on

That is precisely why we all won’t be doing Haskell in a century. It is a perfect academical example, but in real life it suffers too many diseases. Idris maybe, it looks promising and it does not require a 1GB boilerplate for a ToDo application.

That is what a Java developer said 5 years ago about scala and clojure. There is a wave of functional languages. You eaven have lambdas now in c++, c# and java.
I don't think we will be throwing our macs and start using lisp machines, but with webassembly just around the corner, who knows in what kind of language we will be programming in five years.
I never expected to be programming a backend in javascript, but it turns out to be very eficient and stable.
All I say is keep your mind open and learn the best practices for each language and paradigm. And what is old today, might be new next year. Server side javascript was the new new in the 90, it hit the mainstream with node.

Node is a toy; there is no one single noticeable server all around, running node. Try node on, say, 10K+ incoming requests per a second to see what I mean.

Haskell is not a toy, though. It’s a perfect tool to learn how to code. But business has it’s own rules and while Haskell-developer still creates the types for the future application, we’ll be releasing version 6 already. The path from Java to scala/closure/kotlin is increasing productivity. Step from whatever to Haskell is burying is down 3 feet.

That is why I mentioned Idris.

 

A for loop has one great advantage over every other approach: it's common. I know how a for loop works and I can rely on it working in different languages and projects. It also has a common meaning with a single concept.

While I'm strongly in favour of higher level functions, it's difficult to remember which language has which, and precisely what they do. Is there a map, fold or join? What types does it support? What if a need a small variation, like fold_right instead?

I know how to do all of this in a for loop, so it doesn't bother me to use it if I'm unaware of another solution. If somebody points out an alternate syntax, I'll gladly switch the loop.

 

While I can see where you are coming from with your argument, I must ask: Isn't having to know if a particular language has 0- or 1- or arbitrarily based arrays a bit difficult as well?

I am obviously not saying it is the same amount of knowledge, I just think you have to know the language you are using either way, so why not learn its functions?

 

I could not agree more with this article. A map, filter, or reduce tells me what is going on without looking too deeply at the code. A for loop requires a careful eye to make sure you don't miss a side effect when you refactor something. The performance difference for most cases hardly compares to the maintenance benefits.

 

To think that manual for-loops are always faster than functional approach is a misconception. For instance in Rust using iterators and functions like map and filter are in fact as fast or even faster than manually implemented loops: doc.rust-lang.org/book/second-edit...

It's called zero-cost abstraction: you can write high level functional code and still get similar performance as manually implemented low level code.

I wouldn't be surprised if this is the case with other languages as well since modern compilers and interpreters are really good at optimizing.

 

Its funny that JS devs should talk about "tax" on something as basic and primitive as a for loop.

Almost anyone working in the JS ecosystem is transpiling code. Its one of the only systems in which the resulting code has a larger footprint than the original. If you picked up a compiled language, the resulting byte code is leaner.

So of all things you pay a "tax" on, the loops are the last thing you should worry about.

Secondly, reduce, map and such others only give you an impression that you're writing lesser code. Internally these constructs also implement themselves as loops.

Most of JS is now about programmers perceived convenience. A lot of things you end up doing is actually more complex when you look under the hood

 

While i generally agree with this article's advice, far too often I've seen code like this:

const numbers = [5, 25, 8, 18];
console.log(sum(numbers));
console.log(avg(numbers));
console.log(min(numbers));
console.log(max(numbers));

With loops the fact the code is parsing the array four times would be more obvious. And to my mind easier to fix.

PS: Interesting fact, it is quicker to loop from a large number to zero than zero to the large number. This is because CPUs have a jump if not zero (JNZ) command that eliminates the comparison each iteration of the loop.

 

Three things, probably motivated by the fact that I am old:
1) readability is important, since I am old school I can't quite read this code even though I understand the functional approach (but not much the declarative)
2) The benchmark shows the problem seems to be the for implementation in JavaScript. But somebody else said the for loop is just pushed somewhere else in the libs. The point is that this is language dependent, JavaScript is what it is, there's no optimization of you use old-school coding. I mean, it's not for loops per se, it's the way these are implemented by the interpreter
3) relying on an interpreter feature like tail call "optimisation" (why would you call it like that?) It's to mess bad as relying on the C++ preprocessor that makes code ubdebuggable. But ok we're talking optimization here. Practical optimization is ugly and for a restricted group of people, let's face it.
In other words, I don't consider this nice coding.

Thanks for the article anyway, might come handy :) if you can give pointers to why the benchmarks show up like that it would be cool!

 

Thank you for your comment. To your points:

Ad 1)
I think it is only a question of getting used to it, as you say :)

Ad 2)
I'm no JavaScript engines expert, but people seem to argue, that calling a function is expensive (I'm not so sure about this argument, however I can't disprove it) and for cycles can be optimised better (for example unrolled).

Ad 3)
It does not rely on Tail Calls, my argument was merely that map & co. do not necessarily need to be implemented using cycles but for example using recursion (that would indeed need proper tail calls to avoid stack overflow).

The reason I call it optimisation is my college professor in the Runtime systems course called it that. We implemented a garbage-collected byte code language there, in my case it was a subset of Scheme, and one of the requirements (if you chose functional language) was "infinite" recursion. You can handle tail calls naïvely and overflow the stack or properly recycle the stack frames and be able to run "forever". That's why even the compat-table calls it "proper tail calls (tail call optimisation)".

The important point here is that this is an optimisation of the runtime itself, not an optimisation of the program being run.

I hope this helps to clarify things :)

 

Anyone using a for loop instead of a functional programming construct is not being daft. The issue is that they haven't gone through the process of fully learning FP. I dare say that you will find other issue related to this in ALL of their code.

 

Have to say I disagree. I think this is a matter of style.

  • The whole purpose of a function (typically) is to 'abstract out' a collection of logic into a more general, broad operation. Like your first example, if you want a function to sum elements in an array but the language doesn't have that function, you write it. Whether you use a loop at that point or use a function that abstracts out the logic of a loop is your choice, at the end of the day your abstracting the entire job of computing the sum or elements into a single function anyway.
  • I like working with C and thinking about a program closer to the way that it is actually being run on a lower level. When I see the iterative sum function in your first example, I tend to actually conceptualize a block of memory and think of the need to iterate over that memory in order to make use of each piece of memory. Seeing something like "array.reduce" means I need to go off to consult an api and figure out what this function is and how to use it, and there could be nuances of the function that I want to know about - like if I had wanted something that iterates over an array but can also increase the size of the array in the process of iteration or something like that. With a loop, the possibilities and limitations are more obvious.
  • that said there can be something nice about not using iterators. I remember this from working with matlab, where you can write computations in terms of matrices, which naively looked like they would be really inefficient computations, but which matlab could somehow work with to compute quickly.
 

So true. Absolutely agree. The "for loop tax" is part of the C tax we are still paying, along with console I/O and global variables.

 

This is all very well, but it's not possible if you need to support Internet Explorer. I can't see that problem going away any time soon.

 

Actually, map is supported since IE 9 and so is reduce, so there is no need to worry (the support tables are at the end of the linked pages).

 

Why not? First IE11 will support most if not all of this functions. On the other hand you can use Babel to convert your ES6 to ES5.
My last point is that this is a good practice post, that's applicable to anything, not just the web, you should try to use this pattern no matter what you are doing

 

This! And yes some of us still have to support IE8! 😕

 

A wise man once said "never do the talking when jsperf exists and can do the talking for you"

The abstraction is pretty, but the for loop is orders-of-magnitude faster.

jsperf.com/old-man-yells-at-clouds...

 

Basically the advice is learn some Functional Programming 😎.

 

Many people are afraid of the term, but you are absolutely right 😁

 

For something as simple as a sum of an array, 7 lines of code seem quite a lot.

In your example it's 7 lines vs 4 lines (a gain of 3 lines, impressive). In a real work example the function could easily be 50 lines of code vs 47 lines of the functional approach, being gain still 3.

You must handle the bounds of the iteration yourself

In JS yes, not in Python:

for number in array :
...

 

Yeah but for instance in Java, for-loop is considerably faster than any other approach.

 

Great article. I hadn't thought about it, but I can't remember the last time I wrote a for loop. I do, though, remember the last time I had to explain one to someone...

 

Thank you! And I know what you mean. On the other hand some loop is always better than for example copy/pasting the same thing five times (I saw production code like that...).

code of conduct - report abuse