In a project I'm working on I'm trying to keep it as lean as possible, which means I haven't reached for libraries like Lodash. Instead, I've challenged myself to hand-roll everything I need.
I needed to get an array of items unique by a given key, just like the Lodash uniqBy. I had a quick Google around to see how other folks are approaching it.
I came across the following approach:
function uniqueBy(myArr, prop) {
// Iterate over the array and filter out duplicates
return myArr.filter((obj, pos, arr) => {
// Map over the array and get the values from the key.
return arr.map(mapObj => mapObj[prop]).indexOf(obj[prop]) === pos
})
}
While this works, I was not too fond of mapping inside the filter. So I set up some tests around my function and started created benchmarks on jsPerf.
With an array of 10,000 items, this had a whopping 0.63 ops/sec
. Zoinks.
Note
I just want to clarify, I'm in no way trying to criticise the author of the above function. Most people can use that and not notice any negative performance impact.
Iteration 1
So I thought, what if I moved the map outside of the filter?
function uniqueBy(myArr, prop) {
// Get all values for the prop up front
const vals = myArr.map(obj => obj[prop])
return myArr.filter((obj, pos, arr) => {
return vals.indexOf(obj[prop]) === pos
})
}
Result: 3,067 ops/sec
Extracting the map outside of the filter had much better results, relatively speaking.
Iteration 2
Keeping the same vibe, I moved onto Array.prototype.findIndex
function uniqueBy(arr, prop) {
return arr.filter((record, index, self) => {
// Instead of getting the values, just get the index where the predicate is true.
return index === self.findIndex(t => t[prop] === record[prop])
})
}
Result: 6,962 ops/sec
But this is much of muchness; this will still make multiple passes over the arrayβtime to whip out the old trusty loops without predicates.
Iteration 3
function uniqueBy(arr, prop) {
const len = arr.length // get the length up front to ensure it's only accessed once
const data = [] // This will be our return data
const seen = [] // This is a collection of values we've already seen
for (let i = 0; i < len; i++) {
// Get the things I care about here to only access the properties once.
const item = arr[i] // The current array item
const val = item[prop] // The current items' value that we want to unique by
// If there's no record of this in "seen", push it to seen and add it to our return array
// What's with the tilde? Since indexOf returns a number between -1 and N, the tilde (~) is used to convert that value into a boolean. It's the bitwise NOT operator. Link at the bottom.
if (!~seen.indexOf(val)) {
// Mark this value as seen
seen.push(val)
// Add the value to the return array
data.push(item)
}
}
return data
}
Result: 15,196 ops/sec
swoon
So we managed to get rid of the predicate callbacks, our tests still pass, and it's faster. Now we're getting somewhere.
It's somewhat less readable than previous iterations, but that's not my goal. We could stop here, but I think we can squeeze some more out of this.
Iteration 4
What if we use a Set
? They're pretty nifty right:
function uniqueBy(arr, prop) {
const len = arr.length
const data = []
const seen = new Set() // Create a Set
for (let i = 0; i < len; i++) {
const item = arr[i]
const val = item[prop]
if (!seen.has(val)) {
// Check if the set has the value
seen.add(val)
data.push(arr[i])
}
}
return data
}
Result: 11,133 ops/sec
Wait a minute! That's slower than the previous one. Wha-, ugh-, but it's nifty! Ah well, on we go.
Iteration 5
After perusing over some benchmarks on loops, I saw that a while
loop vastly outperformed a for
loop.
function uniqueBy(arr, prop) {
const len = arr.length
const record = []
const seen = []
let cursor = 0
while (cursor < len) {
const item = arr[cursor]
const val = item[prop]
if (!~seen.indexOf(val)) {
seen.push(val)
record.push(item)
}
cursor++
}
return record
}
Result:: 15,164 ops/sec
Boom! A while loop made this one is our fastest one yet, but even less readable.
Iteration 6
Hmm, from the loop benchmarks, decrementing is faster than incrementing, how does that look?
function uniqueBy(arr, prop) {
let len = arr.length
const record = []
const seen = []
while (len--) {
const item = arr[len]
const val = item[prop]
if (!~seen.indexOf(val)) {
seen.push(val)
record.push(item)
}
}
return record
}
Result: 15,535 ops/sec
CAVEAT: We've lost the original order of the array.
These are marginal gains here vs the previous iteration.
Iteration 7
If there's one thing I know about JavaScript, it's that property access is fast. seen
doesn't need to be an array, what if we just kept a dictionary of seen keys?
function uniqueBy(arr, prop){
const len = arr.length
let cursor = 0
const record = []
const seen = {}
while (cursor < len) {
const item = arr[cursor]
const val = item[prop]
if (!seen[val]) {
seen[val] = 1
record.push(item)
}
cursor++
}
return record
}
Result: 24,970 ops/sec
The best one yet!
Iteration 8
Okay after doing some more research on loops, I came across this little number
function uniqueBy(arr, prop){
const record = []
const seen = {}
for (let i = 0, len = arr.length; i < len; ++i) { // Notice the len = arr.length
const item = arr[i]
const val = item[prop]
if (!seen[val]) {
seen[val] = 1
record.push(item)
}
}
}
Result: 26,390 ops/sec
Hmm, this is the winner (so far). But why? Didn't we find the while
loop faster? All that's happening is the len = array.length
is just caching the length. We were already doing that?
All I can think is happening is something to do with Locality of Reference. I have no formal Computer Science, and I'm not a particularly smart man. If someone can explain to me why this is faster, please comment π
I recreated these tests on ESBench here: ESBench Results if that's more your cup of tea.
Bonus
Here are some other variations I tested with negligible performance gains/losses:
++cursor vs cursor++
function uniqueBy(arr, prop) {
const len = arr.length
let cursor = -1
const record = []
const seen = []
while (++cursor < len) {
const item = arr[cursor]
const val = item[prop]
if (!~seen.indexOf(val)) {
seen.push(val)
record.push(item)
}
}
return record
}
Reducing variables (π© )
function uniqueBy(arr, prop) {
const len = arr.length
let cursor = -1
const record = []
const seen = []
while (++cursor < len) {
if (!~seen.indexOf(arr[cursor][prop])) {
seen.push(arr[cursor][prop])
record.push(arr[cursor])
}
}
return record
}
Summary
This whole process is mostly a fruitless endeavour. We could have stopped at iteration number 3 and put our feet up; however, I just wanted to see how fast we could make it. I'm glad I carried on since I found the seen
object approach.
You do not need to do this in your applications. You should only go this deep down the rabbit hole (and arguably further), if you're experiencing performance issues.
If you have a faster way, please ping me on Twitter @moistmakerr or comment. I'd love to know how fast we can push this.
Top comments (0)