loading...

Performance measurement of JavaScript solutions to common algorithmic questions (part 1)

detunized profile image Dmitry Yakimenko Originally published at detunized.net ・6 min read

Originally posted on detunized.net

I stumbled upon a tweet and then a post by Emma Wedekind. She goes over some solutions to most common interview questions for JavaScript positions. There are three different problems. And for each she offers a brute force and an optimized solution, where the latter is a shorter and more elegant version, that normally uses functional programming style over C-style loops. I wouldn't call them brute force and optimized though, rather naive and elegant.

The naive solution anyone can write as soon as they grasp the very basic concepts of the language and understand the problem. The elegant solution requires a better knowledge of the standard library. The code becomes clearer and more succinct. But that sometimes comes with a performance penalty that is not very obvious at the first glance. In this post I wanted to go over some of those less obvious points.

Reverse a string

Take the first problem: reverse a string. To make it simpler, let's first assume we have only basic ASCII or single code point UTF-16 characters in our string we want to reverse. I'll touch on what to do if it's not the case later.

function reverseString(str) {
    let reversedString = '';

    for (let i = str.length - 1; i >= 0; i--) {
        reversedString += str.charAt(i);
    }

    return reversedString;
}

Here we append the characters from the back of the original string to the begging of an empty string one by one. This way we get the reversed string. A non obvious thing that happens here is the string reallocation. Strings in JavaScript are immutable which means every time we modify a string a copy is created. I'm sure there are some clever optimizations to not to reallocate too much and not to create too many useless copies. But the fact is, some array has to grow and some reallocations and copying must be happening. Every time a reallocation happens some new memory gets reserved and all the bytes of the old string get copied to the new place. This takes extra time which is invisible if you look at the code. It would be best to preallocate the memory, since we know how many characters we are going to need ahead of time. As far as I know there's no way to do that in JavaScript.

function reverseString(str) {
    return str.split('').reverse().join('');
}

This solution is for sure much clearer and more elegant. It's much easier to reason about the code. It's instantly obvious what the function does.

The hidden danger here is the additional memory the VM has to allocate to keep the temporaries. This code is roughly equivalent to the following:

function reverseString(str) {
    let chars = str.split('');
    let reversed = chars.reverse();
    let result = reversed.join('');
    return result;
}

When it's written like this it becomes clear that during the execution we'd need to keep at least two copies of the string in memory. First chars. It has to coexist along with the original str. Next reversed. For a short time it has to coexist along with chars. And then result has to coexist along with reversed. So worst case 4x the memory. Imagine now the string is 1GB long. And of the garbage collector kicks in in between some of these calls, the total run time is gonna a lot longer than it looks.

Here's some quick and dirty profiling (x-axis is string length, y-axis is milliseconds):

reverse string

The elegant solution here is indeed optimized. I'm guessing that is due to all the work being done in the native functions that are dealing with the string as opposed to JavaScript code that runs on the virtual machine. And as you can see the first two solutions are not quite linear because of the intricacies of the underlying VM and CPU architecture. This would take a separate article (or a few) to explain that.

I briefly mentioned above the character encoding problem. Really it's not safe to just reverse the utf-16 code points. There are some really complicated rules on how Unicode text is formed and some characters or graphemes could be up to 6 code points together. Look at this answer for some details. Long story short: you need to use a special library to deal with characters. Don't reinvent the wheel.

The longest word

The second problem from the original post is about finding the length of the longest word in a string where words are separated by spaces. The original solution is to split the original string and then loop over the resulting array to find the longest string:

function findLongestWordLength(str) {
    let maxVal = 0;

    const wordArr = str.split(' ');

    for(let i = 0; i < wordArr.length; i++) {
        let word = wordArr[i];
        if (word.length > maxVal) {
            maxVal = word.length;
        }
    }
    return maxVal;
}

This version creates a temporary array to store all the words extracted from the original string. The words themselves also need to be stored somewhere. Since the strings are immutable it's very likely the actual bytes are not really duplicated but rather referenced in the original array. But that's hard to guess without looking at the VM source code. The fact is a bunch extra memory is needed.

The optimized version uses more functional approach:

function findLongestWordLength(str) {
    const arrOfWords = str.split(' ');
    const arrOfLengths = arrOfWords.map(item => item.length);

    return Math.max(...arrOfLengths);
}

In this version there's even more memory used. arrOfLengths also has to be kept around. And in a situation when we have 1GB of input with only 1 letter words (I know, it's kinda extreme), we would have 3GB total roughly wasted on this.

Math.max thing is kinda broken actually. The spread operator ... is substituting the array elements for the function arguments. Which makes a function call with a boatload of parameters. Which in turn causes stack overflow exception with already 200k elements. Not a very big number. In node REPL:

> Math.max(...Array(200000))
Thrown:
RangeError: Maximum call stack size exceeded

For the benchmarking I fixed this version like this:

function findLongestWordLengthFixed(str) {
    const arrOfWords = str.split(' ');
    const arrOfLengths = arrOfWords.map(item => item.length);

    let maxLength = 0;
    for (let i = 0; i < arrOfLengths.length; ++i) {
        if (arrOfLengths[i] > maxLength) {
            maxLength = arrOfLengths[i];
        }
    }

    return maxLength;
}

The really optimized solution would be not to create any temporary arrays, but iterate over the original string in-place and find the longest word:

function findLongestWordLengthFast(str) {
    let maxLength = 0;
    let currentLength = 0;

    for (let i = 0, l = str.length; i < l; ++i) {
        if (str.charCodeAt(i) === 32) {
            if (currentLength > maxLength) {
                maxLength = currentLength;
            }
            currentLength = 0;
        } else {
            ++currentLength;
        }
    }

    // Account for the last word
    return currentLength > maxLength ? currentLength : maxLength;
}

This function does zero heap allocations and is very cache friendly. Here a quick profile (x-axis is string length, y-axis is milliseconds):

longest word

As you can see the brute force/naive versions are performing much better than the optimized/elegant one.

Conclusion, sort of

It's always tempting to rewrite the code in a more elegant, cleaner and often shorter way. Sometimes it comes with a cost. It's important to understand that cost. Sure, when you're reversing a string of 3 characters any solution would do and none of this matters. But often enough we don't know ourselves in the beginning how big the date will be when the system hits production. Big-O analysis is important, but it's not everything. Those pesky constants in O(n) are important as well. Very important for real systems. They could make a big difference, like double you AWS bill or make response time 5 times larger. Keep that in mind next time you reverse a string in production =)

I used this code to profile the functions above. I used node on macOS. The number may vary across hardware, browsers, OSes. Keep that in mind as well.

Posted on by:

detunized profile

Dmitry Yakimenko

@detunized

Grew up in Russia, lived in the States, moved to Germany, sometimes live in Spain. I program since I was 13. I used to program games, maps and now I reverse engineer password managers and other stuff

Discussion

pic
Editor guide
 

Hey thanks for doing this, and giving me proper credit :)

 

You can preallocate memory using the Buffer class if you're in a node environment.

 

I measured it, the conversion to string afterwards makes it quite slow. Slower than the alternatives.

 

Did you try latin1 or ascii conversion (both ways)?

Not sure. I didn't enforce any encoding. The content of the string was "a a a a...".

The default is 'utf8', which is variable-length characters. So naturally it's much slower.
'latin1' is 1-1 char for byte (Obviously high Unicode will get corrupted, but if you start with a Buffer, you will always be able to get it back.
'ascii' decodes the same as 'latin1', but the encoder drops the first bit.

> Buffer.from(Buffer.from([0xff]).toString('latin1'),'latin1')
<Buffer ff>
> Buffer.from(Buffer.from([0xff]).toString('latin1'),'ascii')
<Buffer ff>
> Buffer.from(Buffer.from([0xff]).toString('ascii'),'latin1')
<Buffer 7f>

UTF8, lol:

Buffer.from(Buffer.from([0xff]).toString())
<Buffer ef bf bd>

Notably, V8 doesn't use UTF8 to represent strings/string chunks in memory. It uses something like UTF16 (2 or 4 bytes per character, never 1 or 3) and sometimes opportunistically optimizes to using something like 'latin1' when high characters aren't used.
So, for weird strings 'utf16le' is fastest, and for basic strings, 'latin1'/'ascii'. The default of 'utf8' is the slowest in all cases (but the most compatible and compact without compression)

const { log, time, timeEnd, error } = console
const announce = msg => {
  const bar = "=".repeat(msg.length)
  log(bar); log(msg); log(bar)
}

const [a, z, _] = Buffer.from("az ", "latin1")
const len = 1 + z - a
const strings = new Array(len)
for (let i = 0; i < len; i++)
  strings[i] = Buffer.alloc(0x1000000, Buffer.from([a + i, _]))
    .toString("latin1")

const badStuff = ["💩 💩 💩", "ਸਮਾਜਵਾਦ", "สังคมนิยม"]
  .map(x => x.repeat(0x200000))

const tainted = strings.map(x => "💩" + x)

const benchmarkEncoding = strings => enc => {
  let failed
  time(enc)
  for (const x of strings) {
    const out = Buffer.from(x, enc).toString(enc)
    if (out !== x && !failed) failed = { enc, x, out }
  }
  timeEnd(enc)
  if (failed)
    error(failed.enc, "failed:", failed.x.slice(0, 6), failed.out.slice(0, 6))
}
const encodings = ["utf16le", "utf8", "latin1", "ascii"]

announce("Normal")
encodings.forEach(benchmarkEncoding(strings))
announce("And now, the bad stuff!")
encodings.forEach(benchmarkEncoding(badStuff))
announce("What about tainted ASCII?")
encodings.forEach(benchmarkEncoding(tainted))

Edit: So, I was wrong. 'utf16le' is slower than 'utf8' on pure ASCII (But 3+ times faster otherwise)

Some thorough research. Well done! One small problem with this, is all of this doesn't really help with the original problem =) We wanted to count letters or reverse them or whatever. It's not correct to assume any trivial encoding, like latin1 or ascii. Nor it helps to put everything in the byte buffer, even correctly encoded sequence of bytes. But I'm sure there are use cases where it's ok to assume 7 or 8 bit encoding, like massive amounts of base64, for example.

There's Buffer.from('ffff','base64') for that :D
I was just responding to you saying that in your quest to to count letters or reverse them or whatever you rejected Buffer because of the conversion time.
So I was wondering if it would be more competitive, on your admittedly ASCII 'a a a a a ... data, without the overhead of the default encoding.

You are right, it's much faster on Node using Buffer to reverse a string if it's known it's ASCII. I assume the same goes for any fixed character size encoding. Code:

function reverseStringBuffer(str) {
    let b = Buffer.from(str, "ascii");
    for (let i = 0, n = str.length; i < n / 2; i++) {
        let t = b[i];
        b[i] = b[n - 1 - i];
        b[n - 1 - i] = t;
    }
    return b.toString("ascii");
}

Look at the green line:

Heeeey, that's pretty fast :v

I don't know what you're using to benchmark and draw these nice graphs, so I'm forced to bother you again:
Can you try it with Buffer.reverse()? :D
It's in place, like the Array one, and inherited from Uint8Array.

The code is mentioned in the bottom of the post: gist.github.com/detunized/17f524d7...
The graphs I draw using Google Sheets. I can add this code later to my set. I actually looked for reverse on Buffet but didn't see it. Weird.

Yeah, it's not listed on the Node docs, because it's inherited.
It's on MDN though.
This is over twice as fast :D

function reverseStringBuffer2(str) {
  return Buffer.from(str, "ascii").reverse().toString("ascii");
}

This was 💩 though:

function findLongestWordLengthBuffer(str) {
  const buf = Buffer.from(str, 'ascii')
  const length = buf.length
  let maxLength = 0;
  let start = 1;
  let offset = 0;
  let last = false;
  while (!last) {
    const found = buf.indexOf(32, offset)
    offset = found+1
    last = offset === 0
    const len = found - start
    start = found
    if (len > maxLength) maxLength = len
  }
  return maxLength
}

Just plugging a buffer into your Fast is twice faster on long strings, same and maybe a little worse on short:

function findLongestWordLengthBufferFast(str) {
  const buf = Buffer.from(str,'ascii')
  let l = buf.length;

  let maxLength = 0;
  let currentLength = 0;

  for (let i = 0; i < l; ++i) {
      if (buf[i] === 32) {
          if (currentLength > maxLength) {
              maxLength = currentLength;
          }
          currentLength = 0;
      } else {
          ++currentLength;
     }
  }

  // Account for the last word
  return currentLength > maxLength ? currentLength : maxLength;
}
 

I'm guessing that is due to all the work being done in the native functions that are dealing with the string as opposed to JavaScript code that runs on the virtual machine.

Maybe I'm misunderstanding you, but I would assume that creating one temp array is generally going to be faster than creating N strings, for large N. And less memory, since the strings will need O(n2).

It's always tempting to rewrite the code in a more elegant, cleaner and often shorter way.

I'll contend that unless the performance issue is staring you in the face, that cleaner version is the right one. When there's a bottleneck, you can profile and find it. The code becoming unmanageable is the more immediate risk IME.

Your IG / photography is very cool, btw.

 

Maybe I'm misunderstanding you, but I would assume that creating one temp array is generally going to be faster than creating N strings, for large N. And less memory, since the strings will need O(n2).

Since the strings are immutable, it's possible to optimize the substring storage. It's possible to make it very efficient, actually. The original string is already occupying the memory, so the parts of it would have to just refer to character ranges in the original buffer. I'm quite certain split and join are implemented in the core in C or C++ and are much more efficient.

And it shouldn't be O(n^2), since the sum of all substrings is the original string, which is n.

I'll contend that unless the performance issue is staring you in the face, that cleaner version is the right one. When there's a bottleneck, you can profile and find it.

That is true to an extent. Sometimes the performance degradation is like a death by thousand cuts, when it's impossible to find a bottleneck. Just things become slower and slower. I agree that premature optimization is not a good thing, but understanding the impact of your code and potential scaling problems is very important. It's quite easy to introduce a naive O(n^2) algorithm and not noice the problems because n is very small. And then the system hits production and things start to time out in weird places.

Your IG / photography is very cool, btw.

Thanks a lot =)

 

the sum of all substrings is the original string, which is n.

Ah, I meant the intermediate strings. The sum of "a", "ab", "abc", ... is triangular, so the memory usage depends on whether they're stored. (I'd originally written something like "language optimization aside" but took it out for some reason.)

But your comment led me to read about "cons strings" in V8; thanks for that. That's pretty cool.

And then the system hits production and things start to time out in weird places.

Regarding UniformlySlowCode: FWIW, my experience is like that of the folks who said there that they hadn't seen it firsthand, or only saw it when the architecture was a bad fit. It does happen. Twitter rewrote their RoR back end in Scala. But they do real-time messaging with 250m users. They're an exceptional case. Who knows if they'd have gotten to that size if they hadn't started with RoR?

I worried about writing tight loops when I did contest problems (in Java, so buffered string building was necessary). I also might have a different take if I wrote DOS games. But what I see in my web application work is stuff like this:

  • transaction wraparound ID / vacuuming made an important db table unusable
  • big parallel image processing jobs done by a microservice had to be batched because having them each call back with results DDOSed us
  • CSS transitions + scaling of large images on mobile retina devices quickly hit the per-tab memory limit
  • not being able to pass continuous mouse input, or even fast keystrokes, through a redux store

The causes vary, but they're never amenable to a quick fix involving array indices. I guess something that is similar is when you have to replace ORM-generated queries with hand-tuned ones. You can (and people do) advocate ditching the ORM. But there are always trade-offs, and I've seen some real dumpster-fire data access layers on projects that lack them.

Anyway, if I somehow had to regularly reverse 1 GB things, I'd put them on a job queue, and might also have the app shell out to a C program that first wrote the reversed thing to disk. Probably not the right answer for an interview question, though 😂😂

Ah, I meant the intermediate strings. The sum of "a", "ab", "abc", ... is triangular

That's triangular, but In don't really see why would that be needed in split.

Anyway, if I somehow had to regularly reverse 1 GB things, I'd put them on a job queue, and might also have the app shell out to a C program that first wrote the reversed thing to disk. Probably not the right answer for an interview question, though

I understand that it doesn't make sense to reverse 1 gb stings and this will never happen in real life. As never will happen a task to reverse a string or find the longest word on your day job after you passed that interview. It's just a made up example to talk about the code. And to talk about performance of that code. Which might be very well a follow up question to this answer. When you write any code you should be able to reason about it's performance to some extent, both memory and CPU impact. That's why talking about very big numbers becomes important, for n = 3 any algorithm works fine ;-)

 

Thanks for the interesting article! I am just wondering, for the second example, which code does the orange "fast" graph correspond to? Also, did you happen to graph the memory usage? I think that would be useful along with showing the speed.

 

The orange is my own "fast" version (the last code sample). I tried to measure memory, but I couldn't get anything reasonable from node. The numbers didn't seem to change. Maybe, I'll get to it again when I have time.

 

Ah - thanks! That makes sense.

 

It would be best to preallocate the memory, since we know how many characters we are going to need ahead of time. As far as I know there's no way to do that in JavaScript.

Actually, you could convert the string into a typed array and swap back to front and then convert it back. It would probably be more efficient in web assembly.

 

Check out this thread dev.to/alephnaught2tog/comment/925a. The Buffer won everything in the end =)