Please don't "overchain" array methods

Some Dood on February 03, 2019

Before modern JavaScript was a thing, the only conceivable way of iterating over arrays was to use the classic C-style for loop. It was cumbersom... [Read Full]
markdown guide
 

While I agree with the content of this article, I think any good article with advice on optimization needs to have some basic profiling to prove their point. Something as nebulous as "loop iterations" isn't really a convincing metric.

Here are the timings of each of the array operations performed in this article as performed on a randomly generated array of 50000 numbers on my machine. This gives an idea of the relative performance of each optimization.

sum:       6.189208984375ms
level1Sum: 4.746337890625ms
level2Sum: 1.995849609375ms
 
 

Percentage-wise, sure. In practice, is a human being going to notice a 4.2ms improvement?

I can't upvote this enough. It is such a simple, yet effective way of demonstrating where inefficiencies of code have a real-world impact. I'm going to use this in future discussions. People often see such metrics in isolation, without understanding the larger impact of why it's important to write more efficient code.

I work with server-side code mostly, and the same principle applies. If I can save the server even 1 second of processing on a given request, then it means that every instance of this request has time saved. Which means that many concurrent requests will not use as much of the server's resources.

^ applies to client side code as well, which means if you replace "server" with "client" (desktop / mobile), then you still have the same gains.

I always thought the whole "humans won't notice the performance improvements" thing bogus. Sure, they probably won't see a difference.

However their machines do. Sure, we are building applications for humans, but that doesn't mean we should overlook performance that only machines will notice.

There is a downside. Typically most performance tuned code has greatly reduced human readability, like the solution provided in this article.

Sometimes, it's a price we have to pay if completely necessary. Indeed, it is frustrating how we can't have both readability and performance. Perhaps transpilers can be used to somehow convert less performant code to their more performant counterparts, but then that's just adding a whole new layer of complexity to the project. It isn't really a "solution" per se.

Transpilation seems like the cheapest of the three possible options to me.
Though, I don't see a lot of transpilation optimizing for performance.
I feel like we have the community for such an undertaking now, but those that care about raw performance of their web scripts have now focused on WASM instead. Which is a smart move, and has a lot more potential.
Yay WASM!

 

Except that, if you are using JS for web development, as most of us, 99% of the times you will be working with arrays containing tens of objects, and theese optimizations will be irrelevant.

I've often heard "there will never be more than 20 items on this page" followed by "let's add this complex metadata to each item" followed by "shit, due to unforeseen circumstances, a big client needs 20000 items on that page, and now it feels sluggish".

I'd say don't spend time on difficult optimizations with negligible real world improvements... But when the optimization is extremely easy, why would you NOT do it?

I wish JS runtimes did better inlining and monomorphization.
Then you could have your own utilities like

export const map => (arr, fn) {
  const len = arr.length
  const out = new Array(len)
  for (let i = 0; i < len; i++) {
    out[i] = fn(arr[i])
  }
  return out
}

Which beats out what the spec method does:

export function map(fn, thisArg) {
  const O = Object(this)
  const len = O.length
  if (!(fn instanceof Function)) throw new TypeError(`${fn} is not a function`)
  const A = new Array(len)
  for (let k = 0; k < len; k++) {
    const Pk = k.toString()
    const kPresent = O.hasOwnProperty(Pk)
    if (kPresent) {
      const kValue = O[i]
      const mappedValue = fn.call(thisArg, kValue, i, O)
      Object.defineProperty(A, Pk, { value: mappedValue, writable: true, enumerable: true, configurable: true })
    }
  }
  return A
}

But most of that is lost unless you actually write the upper loop inline, because unlike the built in methods it gets deoptimized if you call the utility with many different fn functions. Especially when some of them throw.

 

Thank you so much for your advice and for taking the time to investigate further! I will definitely add metrics in the next time I write performance-related articles.

 

I disagree with this article.
We shouldn't sacrifice readability and composability for performance gains, and we don't have to.

Our goal here is to iterate over things in a way that is both readable and performant. Let's have a look at different methods of achieving this.

We want to multiply stuff by 2, and then add 4. Let's define these goals as functions:

const mulByTwo = (n => n * 2);
const addFour = (n => n + 4);

This, as mentioned in the article, is readable, but may become a performance issue:

[1, 2, 3].map(mulByTwo)
         .map(addFour);

The proposed solution, however, is not very readable, and doesn't scale/compose well:

[1, 2, 3].map(n => n * 2 + 4);

This uses composition, but is far from being readable because of the
reversed, "inside-out" order of execution

[1, 2, 3].map(n => addFour(mulByTwo(n)));

Solution? Best of both worlds. Let's create a higher-order function which would allow us to pipe
other function calls in a nice, clean, left-to-right order:

const pipe = (fn, ...fns) => 
  (...args) => fns.reduce((acc, f) => f(acc), fn(...args));

We can now do stuff!

[1,2,3].map(
  pipe(mulByTwo, addFour)
);

In fact, multiple functional/semi-functional libraries have pipe function in their toolbelt. it's also a part of my very minimal fp library.
We can as well define all, some or none functions and use them to compose our filter logic.

 

Funny, I usually define my pipe as following:

const pipe = (...fns) => arg => {
  let val = arg
  for (const fn of fns) {
    val = fn(arg)
  }
  return val
}

I also make args unary, so that I don't have to think about methods like map injecting spurious parameters.
Next comes giving it a TS type signature.
Which involves overloads.
It's horrible, but the results are great.

 

What is the performance of this solution?
Because this will be guaranteed slower than the solution in the artice because the JavaScript engine almost certainly doesn't do functional optimization. (+ You create a capture witch is slow, needs more optimization which adds to potential jit time)

It is also less simple as you have to understand more concepts to understand the code, especially because JS is an imperative language the functional style is a bad fit. In imparative languages a for-loop with the proper if-statements and calculations will always be the better option for both performance and simplicity.

Please don't bully my phone's battery more than JS already has to. :)

 

Now this is an elegant solution. The only problem with it is that it isn't exactly the most beginner-friendly code. You must have a significant amount of experience with functional programming to be able to fathom the notion of functions inside functions inside functions for composition.

Besides that, it really is an elegant solution that combines the best of both worlds. Pipes truly are a marvel of (software) engineering.

 

isn't exactly the most beginner-friendly code. (...) Pipes truly are a marvel of (software) engineering.

Maybe it's because I started out learning programming from the FP side, but composition feels like pretty intuitive beginner stuff, and I've always had trouble reading ugly loops which store all kinds of stuff in variables.

I think "beginner friendly" highly depends on what you've been exposed to so far.

Yuppp, loops are always ugly. There is no question about that. You have no idea how many times I've gotten myself inside infinite loops. It's just a horrible experience when I was a beginner.

And I think you're right about that. It does depend on what you've been exposed to as a beginner.

This is a bit like saying that nails are always ugly, you should use screws. Beginners should learn when to use which, in addition to how not to hit their fingers.

FP is cool and all but FP in JS has become an unbearable infomercial lately: "Have you suffered from infinite loops? Ever feel like your code doesn't have enough dots or arrows? Tired of using plus signs for addition? Try FP"

 

This is what I call unnecessary optimization. If you follow time complexity optimization, O(3n) is still just O(n). Now, I'm not sure what the space complexity is on this, but unless your arrays are massive, I'm sure it's negligible for integers.

.map and .filter do create new arrays so that definitely is something to be aware of, but doing 3 operations in 1 loop is effectively the same thing as 3 loops with 1 operation.

I think you're misleading beginner JS developers with this article.

 

Yes, that is right. But even though they ultimately boil down to an O(n) time complexity, it can still have a significant performance impact if your arrays are massive, as you said. In a way, I'm not really misleading beginners here because it's the truth of the matter.

But for most cases, one does not need to worry about these performance impacts. It really is just an unnecessary optimization.

 

I don't believe that you make any comments about .map using more space in your article do you? You also don't comment and say this is only worth doing in one step if your arrays are massive. That's why I suggest that you are misleading beginners. Your example is contrived. For this article to be really useful, you should use a really big array as an example and demonstrate the performance benefits of moving all your operations into the reducer. At the same time, you should speak about the code using significantly less space and that's why it's faster.

Ah, I see now. There's a reason why I didn't add the #beginners tag for this article. It is exactly for the reason you stated. I wasn't targeting beginners for this article. I was assuming that those who read it will most likely know how to optimize their algorithms themselves with what they learned from this article as a guide on how to do so. I suppose that was wrong of me to assume.

Also, I was actually thinking about discussing the space inefficiency of long chains of array methods, but I felt that doing so would have caused me to stray away from the actual message I was trying to convey throughout the whole article: longer chains mean more iterations.

 

Waiting 1m is still better than waiting 3m. Disregarding the less significant factors in O(3n) is not always a good idea in practice.

I agree all of this is a pretty useless exercise when you have 3 consecutive maps over 10 or even 1000 items, but this optimization is the reason many FP libraries offer super easy to use pipe/compose functions.

I'd say, if the optimization is that easy, do it — even if the gains small. Without pipe/compose available, I'd be in favor of readability over performance though.

 

What are you talking about. The time complexity doesn't change by moving everything into the reducer, it's the space complexity that changes because .map creates a new array. It's not 1ms vs 3ms...

You are right that it doesn't take three times as long, my mistake.

But it's never just "1+1+1=3" either.

Especially if the arrays are large, and the operations are simple, then the overhead of copying the array doesn't just affect memory use, it also takes longer.

 

The only comment based on the O notation in both time and space complexity. Looks like not all folks know about that.

 
 

Do I like functional programming? Yes.
Is this particularly readable and self-explanatory, after all the optimizations? Meh.
Are there still pointless function calls and variable allocations? Basically.

const level2Sum = nums
  .reduce((prev, curr) => prev + 3 * curr ** 2, 0);

So if I will at all concern myself with optimizing this case I'll just go to the following, which doesn't at all lack readability:

let level3Sum = 0
for (const num of nums) {
  level3Sum += 3 * curr ** 2
}

But since I am in fact obsessed with using const so I know I'm not mutating this number later in the use site, I will extract this loop and assign to const:

/* IN THE ROOT SCOPE, NOT IN THE FUNCTION!!! */
const l3s = (nums: number[]) => {
  let sum = 0
  for (const num of nums) {
    sum += 3 * curr ** 2
  }
  return sum
}
/* elsewhere... */
const sum = l3s(nums)
 

While using map filter and reduce is great for readability. Using a for loop is the fastest of all. So if your looking to optimize as much as possible i would go with a for loop.

 

In Chrome 71 level2Sum is still 88% slower than a C-style for loop.

jsperf.com/chained-maps-vs-for-loop

In Edge 42 the level3Sum is just as slow as the original chained maps.

 

Oh, man. Indeed, that does look pretty messy. 😬

 

I don't think chaining array methods is a big deal since for the vast majority of the time performance doesn't matter. If its hot code then I can see it being a concern, but the advice given in this article is not generally applicable in my honest opinion.

 

Yes, that's right. You definitely shouldn't be bothered too much by these nuances because they're too negligible to even notice.

I could say that the point of this article is pretty much just to serve as a heads-up.

 

We wouldn't have to worry about it at all, ever, if they introduced them properly, as iterator/stream adapters, and not one-off map(this:Array, fn): Array functions.

 

Exactly, D lazily operates over the elements and only needs to be eager for certain algorithms like sort.

sort isn't a streaming/iterating operation, it's a collection operation. So if you have a stream going, it would make sense to sort on insertion into collection. Something like iter.sort(): Array or iter.sortInto(Array): void

D does an in place sort, so the range needs swap and random access. But yes, it isn't a stream.

Which means it needs to allocate the whole range, as an internal implementation.
I feel like whatever optimizations you can get out of hiding that implementation (Are there any, even?) are outweighed by the better decisions the developer will make when you shove this information in their face.

It isn't hidden, it won't compile if the range doesn't provide the needed operations. The simpilist solution is to call .array() before the call to sort.

Aw yiss, get that D!
Homegirl on a mission.

Also: Oh no, this competes with Rust in a number of usecases, better say bad things about it all around just in case.

Also: Vector operations look sweet.

 

Great article, the advice you provided can truly help with the optimization of operations on arrays.
And as for the first advice, in our company, we've just created our custom iterator to optimize chaining without making it difficult to read.
It utilizes 'lazy' computing, meaning that it doesn't compute anything until you ask for it, and thus it only does one iteration over the array no matter how many calls there are in the chain.
Due to the additional overhead though, it still won't beat your last example in performance, where you combined everything into one call, but whenever there is more than one operation in the chain, it works somewhat faster.
It also helps when performing some operations on the other containers like Map and Set, since they also support the iterator protocol, but unfortunately do not yet support all of the operations available for arrays, like map and reduce and others (it's coming soon, though).

 

Good article, but this shows JS is not functional. By the isomorphism of FP with CT we know that (fmap g) ∘ (fmap f) equals fmap (g ∘ f), so a compiler would be able to optimize this away easily. This is not the case in JS because g and f might have order-dependent side-effects, i.e. it is not functional.

 

Precisely! That is a good point. I believe this is exactly why JavaScript is considered to be a "multi-paradigm" programming language, at least according to Wikipedia. It may not exactly have the purest functional programming practices, but it sure does blend them pretty well with object-oriented programming, considering that the two are kind of like polar opposites of each other.

 

I know this is a popular stance, but I disagree with it.
Just being able to pass functions around as arguments doesn't make something functional. Claiming otherwise seems disingenuous to me, as you'd be claiming the benefits of FP without delivering (barring some syntactic sugar). You might call JS "functional style" or "closure-centric procedural". I just made the later up, but I think it's rather descriptive.

Now that you mention it, I think I agree with you in that regard. It does seem more accurate to consider JavaScript a "functional-style" programming language rather than a "functional programming language mixed in with some object-oriented features".

 

This is something that should be the same for the developer and optimized in the compiler or interpreter. I disagree with the idea of pushing developers away of writing nice code.

 

Yeah, but they already implemented it the way they did.
Now we must sit and suffer.
Or use a transducer library which sometimes beats this horrible decision in terms of performance in userspace without calling out to native code... and profile when we should use that. Once again putting more strain on the developer.

 

Yeah, I definitely agree with you there. At least this "feature" forces us to think more carefully about our code. In the end, we become better programmers because of it.

 

I mean, sure, but my arrays are generally pages of blog posts or user information. If I have more than 10 items in an array at at time, sure, I'll start optimising. Until then, readability wins. Optimising has a cost.

A rule that I like about optimisation is this:
You are only allowed to fix a performance problem without profiling if you have already fixed that same performance problem with production data. If you haven't, leave the code alone, in a readable form.
Once production shows signs of slowness and you have profiled the code and confirmed that it is indeed the chained calls on an array, you can fix it. Until then, leave it alone, it's probably fine as it is.

 

Very useful to remind people of that.
This is basic algorithm time complexity : with an array of N elements
one loop = O(N)
2 loop = O(N+N)
X loop = O(N*X)
So for an array of 10 elements and 4 ".map" the time to execution is 4 times what is needed.
And you need to add to that the time to push new values in the new arrayS.
I think a general rule can be "NEVER USE SAME ARRAY METHOD TWICE".
Also maybe always use "filter" before "map" ?
And i would also like to point that if for some people a 3 term operation is a problem of readability : god are people becoming this lazy ?

I too like readable code, but our primary goal is "efficiency".
Efficiency = performance and should ALWAYS take over what any idea of "beauty" someone may have.
I'm not talking about micro-optimizations but to avoid unnecessary operations.
First do the job in an efficient manner, then refactor to make it human readable as possible.
Sometimes complex things are complex to read.
We are not making art.

 

I mean, one can't really say that those who disagree with this article are completely wrong. Readability really is an important aspect of writing code. Performant code means nothing if it is unreadable.

Also, yup, I definitely agree with using filter before map. That should honestly be a default for all JavaScript programmers.

 

I like the perspective that we have to decide what we're optimizing: the writing, the execution, or the reading. Each has its place and tradeoffs.

 

JSPerf is down atm, but i'm pretty sure this is faster and easier to read:

const nums = [ -2, -1, 0, 1, 2 ]
var sum = 0; for(var i = 0; i < nums.length; ++i) sum += nums[i]**2 * 3
 

I mean... yes, it definitely is, but the point of my article is not to promote the most performant code; rather, the point of this article is to remind the reader to be wary of "overchaining" array methods because of its implications on the number of array iterations. 😅

 

It is exactly this problem, the trade-off between readability and efficiency, pertaining to chaining array methods that has inspired me to create a solution. I've been working on a small library that transforms iterables into a form that can have chained array methods with a single iteration. Something similar to Java 8's Stream API.

 

That's why ghc does "loop fusion". JavaScript has the problem of allowing impure functions, but it should be able to notice pure functions and do loop fusion when it can, so the programmer needn't do the compiler's job for it.

 

This sounds like a great idea. Really, the only obstacle that stands in the way is implementation. I don't think it will be easy to make a JavaScript engine intelligent enough to detect pure functions and "fuse loops". Given the workload of JavaScript engines nowadays (execution, optimization, event loops, garbage collection, etc.), it seems to be a stretch to see this feature come in the near future.

 

There will be times when a loop will be much better than chained array methods in terms of elegance and readability.

code of conduct - report abuse