During a code review, I noticed a piece of code that was calling includes
inside filter
.
The code looked something like this:
const valuesToKeep = [2, 4];
const valuesToTest = [1, 2, 3, 4];
const values = valuesToTest.filter((n) => valuesToKeep.includes(n)); // [2, 4]
The comment I left was along the lines:
"Hey, be careful, filter is O(n) and includes is O(n) so this is O(n2). This would perform as O(n) if you used a Set, BUT since the arrays are so small it probably does not make a difference".
To illustrate the comment with code, this is what I had in mind:
const valuesToKeep = [2, 4];
const valuesToTest = [1, 2, 3, 4];
const setToKeep = new Set(valuesToKeep);
const values = valuesToTest.filter((n) => setToKeep.has(n)); // [2, 4]
My comment didn't sit well with me because saying "hey this will perform OK because the data is a certain way" is not a good idea: the data may change, or maybe I'm just wrong.
So I decided to put this to test. I'm going to generate two arrays containing random integers: an array of values to keep, and an array of values to test. The premise is that the array of values to keep is much smaller than the array of values to test, so we're going to make the array of values to test 10 times bigger than the array of values to keep.
// array of values to keep
const valuesToKeep = Array.from({ length: LENGTH }, () => getRandomInt());
// array of values to check
const valuesToTest = Array.from({ length: LENGTH * 10 }, () =>
getRandomInt()
);
Then we're going to run two tests: one using includes
, and one using has
, and we're going to start with LENGTH at 10, and increase it every time, as my premise is that for small array it won't matter much, but we want to see WHEN it starts mattering:
// filter using includes
console.time("includes");
valuesToTest.filter((v) => valuesToKeep.includes(v)); // n2
console.timeEnd("includes");
// filter using has
console.time("has");
const valuesToKeepSet = new Set(valuesToKeep);
valuesToTest.filter((v) => valuesToKeepSet.has(v)); // n
console.timeEnd("has");
And here are the results:
Length of values to keep: 1
Length of values to test: 10
includes: 0.207ms
has: 0.190ms
Length of values to keep: 10
Length of values to test: 100
includes: 0.020ms
has: 0.017ms
Length of values to keep: 100
Length of values to test: 1000
includes: 0.204ms
has: 0.071ms
Length of values to keep: 1000
Length of values to test: 10000
includes: 9.942ms
has: 1.307ms
Length of values to keep: 10000
Length of values to test: 100000
includes: 131.686ms
has: 8.016ms
Length of values to keep: 100000
Length of values to test: 1000000
includes: 1324.318ms
has: 71.495ms
So yes, I am right that with a small quantity of data, Array.includes
and Set.has
perform roughly the same, but we can see how quickly performance degrades, and the change is so small that it's hard to justify not making it, even for small data samples. Should the size of the data increase, especially the size of the valuesToKeep
array, the code is future proof.
TLDR: when matching a value against a list of values, convert the Array to a Set first.
Top comments (9)
Hi, I'm not able to reproduce this in the latest Chrome (94.0.4606.61), running on OS X, even with a large value for n. I also tried in Node, including some older versions.
Was there a particular environment where this is an issue? Maybe
Array.prototype.includes
has been internally optimized since this was written?Looks like my image didn't upload, I'm getting:
For n = 100000
includes: 45.72607421875 ms
has: 61.42919921875 ms
For n = 1000000
includes: 367.927978515625 ms
has: 707.369140625 ms
test online: typescriptlang.org/play?#code/MYew...
only for of (not .filter): typescriptlang.org/play?#code/MYew...
Nicely done with the examples, also one of my pet peeves n^2 gets pretty nasty, pretty fast!
Thanks Mike!
Hey just wondering, do you think there is there a time associated with converting an array to a set, and if so is it less than the time we are saving by running has()?
did you get to test this? thanks!
Just what I was looking for - looks like I'll be switching over to using Sets a bit more often!