Performance measurement of JavaScript solutions to common algorithmic questions (part 1)

Dmitry Yakimenko on February 21, 2019

Originally posted on detunized.net I stumbled upon a tweet and then a post by Emma Wedekind. She goes over some solutions to most common intervi... [Read Full]
markdown guide
 

You can preallocate memory using the Buffer class if you're in a node environment.

 

I measured it, the conversion to string afterwards makes it quite slow. Slower than the alternatives.

 

Not sure. I didn't enforce any encoding. The content of the string was "a a a a...".

The default is 'utf8', which is variable-length characters. So naturally it's much slower.
'latin1' is 1-1 char for byte (Obviously high Unicode will get corrupted, but if you start with a Buffer, you will always be able to get it back.
'ascii' decodes the same as 'latin1', but the encoder drops the first bit.

> Buffer.from(Buffer.from([0xff]).toString('latin1'),'latin1')
<Buffer ff>
> Buffer.from(Buffer.from([0xff]).toString('latin1'),'ascii')
<Buffer ff>
> Buffer.from(Buffer.from([0xff]).toString('ascii'),'latin1')
<Buffer 7f>

UTF8, lol:

Buffer.from(Buffer.from([0xff]).toString())
<Buffer ef bf bd>

Notably, V8 doesn't use UTF8 to represent strings/string chunks in memory. It uses something like UTF16 (2 or 4 bytes per character, never 1 or 3) and sometimes opportunistically optimizes to using something like 'latin1' when high characters aren't used.
So, for weird strings 'utf16le' is fastest, and for basic strings, 'latin1'/'ascii'. The default of 'utf8' is the slowest in all cases (but the most compatible and compact without compression)

const { log, time, timeEnd, error } = console
const announce = msg => {
  const bar = "=".repeat(msg.length)
  log(bar); log(msg); log(bar)
}

const [a, z, _] = Buffer.from("az ", "latin1")
const len = 1 + z - a
const strings = new Array(len)
for (let i = 0; i < len; i++)
  strings[i] = Buffer.alloc(0x1000000, Buffer.from([a + i, _]))
    .toString("latin1")

const badStuff = ["💩 💩 💩", "ਸਮਾਜਵਾਦ", "สังคมนิยม"]
  .map(x => x.repeat(0x200000))

const tainted = strings.map(x => "💩" + x)

const benchmarkEncoding = strings => enc => {
  let failed
  time(enc)
  for (const x of strings) {
    const out = Buffer.from(x, enc).toString(enc)
    if (out !== x && !failed) failed = { enc, x, out }
  }
  timeEnd(enc)
  if (failed)
    error(failed.enc, "failed:", failed.x.slice(0, 6), failed.out.slice(0, 6))
}
const encodings = ["utf16le", "utf8", "latin1", "ascii"]

announce("Normal")
encodings.forEach(benchmarkEncoding(strings))
announce("And now, the bad stuff!")
encodings.forEach(benchmarkEncoding(badStuff))
announce("What about tainted ASCII?")
encodings.forEach(benchmarkEncoding(tainted))

Edit: So, I was wrong. 'utf16le' is slower than 'utf8' on pure ASCII (But 3+ times faster otherwise)

Some thorough research. Well done! One small problem with this, is all of this doesn't really help with the original problem =) We wanted to count letters or reverse them or whatever. It's not correct to assume any trivial encoding, like latin1 or ascii. Nor it helps to put everything in the byte buffer, even correctly encoded sequence of bytes. But I'm sure there are use cases where it's ok to assume 7 or 8 bit encoding, like massive amounts of base64, for example.

There's Buffer.from('ffff','base64') for that :D
I was just responding to you saying that in your quest to to count letters or reverse them or whatever you rejected Buffer because of the conversion time.
So I was wondering if it would be more competitive, on your admittedly ASCII 'a a a a a ... data, without the overhead of the default encoding.

You are right, it's much faster on Node using Buffer to reverse a string if it's known it's ASCII. I assume the same goes for any fixed character size encoding. Code:

function reverseStringBuffer(str) {
    let b = Buffer.from(str, "ascii");
    for (let i = 0, n = str.length; i < n / 2; i++) {
        let t = b[i];
        b[i] = b[n - 1 - i];
        b[n - 1 - i] = t;
    }
    return b.toString("ascii");
}

Look at the green line:

Heeeey, that's pretty fast :v

I don't know what you're using to benchmark and draw these nice graphs, so I'm forced to bother you again:
Can you try it with Buffer.reverse()? :D
It's in place, like the Array one, and inherited from Uint8Array.

The code is mentioned in the bottom of the post: gist.github.com/detunized/17f524d7...
The graphs I draw using Google Sheets. I can add this code later to my set. I actually looked for reverse on Buffet but didn't see it. Weird.

Yeah, it's not listed on the Node docs, because it's inherited.
It's on MDN though.
This is over twice as fast :D

function reverseStringBuffer2(str) {
  return Buffer.from(str, "ascii").reverse().toString("ascii");
}

This was 💩 though:

function findLongestWordLengthBuffer(str) {
  const buf = Buffer.from(str, 'ascii')
  const length = buf.length
  let maxLength = 0;
  let start = 1;
  let offset = 0;
  let last = false;
  while (!last) {
    const found = buf.indexOf(32, offset)
    offset = found+1
    last = offset === 0
    const len = found - start
    start = found
    if (len > maxLength) maxLength = len
  }
  return maxLength
}

Just plugging a buffer into your Fast is twice faster on long strings, same and maybe a little worse on short:

function findLongestWordLengthBufferFast(str) {
  const buf = Buffer.from(str,'ascii')
  let l = buf.length;

  let maxLength = 0;
  let currentLength = 0;

  for (let i = 0; i < l; ++i) {
      if (buf[i] === 32) {
          if (currentLength > maxLength) {
              maxLength = currentLength;
          }
          currentLength = 0;
      } else {
          ++currentLength;
     }
  }

  // Account for the last word
  return currentLength > maxLength ? currentLength : maxLength;
}
 

Hey thanks for doing this, and giving me proper credit :)

 

I'm guessing that is due to all the work being done in the native functions that are dealing with the string as opposed to JavaScript code that runs on the virtual machine.

Maybe I'm misunderstanding you, but I would assume that creating one temp array is generally going to be faster than creating N strings, for large N. And less memory, since the strings will need O(n2).

It's always tempting to rewrite the code in a more elegant, cleaner and often shorter way.

I'll contend that unless the performance issue is staring you in the face, that cleaner version is the right one. When there's a bottleneck, you can profile and find it. The code becoming unmanageable is the more immediate risk IME.

Your IG / photography is very cool, btw.

 

Maybe I'm misunderstanding you, but I would assume that creating one temp array is generally going to be faster than creating N strings, for large N. And less memory, since the strings will need O(n2).

Since the strings are immutable, it's possible to optimize the substring storage. It's possible to make it very efficient, actually. The original string is already occupying the memory, so the parts of it would have to just refer to character ranges in the original buffer. I'm quite certain split and join are implemented in the core in C or C++ and are much more efficient.

And it shouldn't be O(n^2), since the sum of all substrings is the original string, which is n.

I'll contend that unless the performance issue is staring you in the face, that cleaner version is the right one. When there's a bottleneck, you can profile and find it.

That is true to an extent. Sometimes the performance degradation is like a death by thousand cuts, when it's impossible to find a bottleneck. Just things become slower and slower. I agree that premature optimization is not a good thing, but understanding the impact of your code and potential scaling problems is very important. It's quite easy to introduce a naive O(n^2) algorithm and not noice the problems because n is very small. And then the system hits production and things start to time out in weird places.

Your IG / photography is very cool, btw.

Thanks a lot =)

 

the sum of all substrings is the original string, which is n.

Ah, I meant the intermediate strings. The sum of "a", "ab", "abc", ... is triangular, so the memory usage depends on whether they're stored. (I'd originally written something like "language optimization aside" but took it out for some reason.)

But your comment led me to read about "cons strings" in V8; thanks for that. That's pretty cool.

And then the system hits production and things start to time out in weird places.

Regarding UniformlySlowCode: FWIW, my experience is like that of the folks who said there that they hadn't seen it firsthand, or only saw it when the architecture was a bad fit. It does happen. Twitter rewrote their RoR back end in Scala. But they do real-time messaging with 250m users. They're an exceptional case. Who knows if they'd have gotten to that size if they hadn't started with RoR?

I worried about writing tight loops when I did contest problems (in Java, so buffered string building was necessary). I also might have a different take if I wrote DOS games. But what I see in my web application work is stuff like this:

  • transaction wraparound ID / vacuuming made an important db table unusable
  • big parallel image processing jobs done by a microservice had to be batched because having them each call back with results DDOSed us
  • CSS transitions + scaling of large images on mobile retina devices quickly hit the per-tab memory limit
  • not being able to pass continuous mouse input, or even fast keystrokes, through a redux store

The causes vary, but they're never amenable to a quick fix involving array indices. I guess something that is similar is when you have to replace ORM-generated queries with hand-tuned ones. You can (and people do) advocate ditching the ORM. But there are always trade-offs, and I've seen some real dumpster-fire data access layers on projects that lack them.

Anyway, if I somehow had to regularly reverse 1 GB things, I'd put them on a job queue, and might also have the app shell out to a C program that first wrote the reversed thing to disk. Probably not the right answer for an interview question, though 😂😂

Ah, I meant the intermediate strings. The sum of "a", "ab", "abc", ... is triangular

That's triangular, but In don't really see why would that be needed in split.

Anyway, if I somehow had to regularly reverse 1 GB things, I'd put them on a job queue, and might also have the app shell out to a C program that first wrote the reversed thing to disk. Probably not the right answer for an interview question, though

I understand that it doesn't make sense to reverse 1 gb stings and this will never happen in real life. As never will happen a task to reverse a string or find the longest word on your day job after you passed that interview. It's just a made up example to talk about the code. And to talk about performance of that code. Which might be very well a follow up question to this answer. When you write any code you should be able to reason about it's performance to some extent, both memory and CPU impact. That's why talking about very big numbers becomes important, for n = 3 any algorithm works fine ;-)

 

Thanks for the interesting article! I am just wondering, for the second example, which code does the orange "fast" graph correspond to? Also, did you happen to graph the memory usage? I think that would be useful along with showing the speed.

 

The orange is my own "fast" version (the last code sample). I tried to measure memory, but I couldn't get anything reasonable from node. The numbers didn't seem to change. Maybe, I'll get to it again when I have time.

 
 

It would be best to preallocate the memory, since we know how many characters we are going to need ahead of time. As far as I know there's no way to do that in JavaScript.

Actually, you could convert the string into a typed array and swap back to front and then convert it back. It would probably be more efficient in web assembly.

 
code of conduct - report abuse