loading...
Cover image for JavaScript: How to Remove Duplicate Values from Arrays

JavaScript: How to Remove Duplicate Values from Arrays

will_devs profile image Will Harris Updated on ・3 min read

Originally posted on Will's blog


In a previous post we saw how to determine whether a JavaScript array contains duplicate values. Today, I want to show a few different methods I've found for removing duplicate values from an array.

Using the Array.prototype.filter() & Array.prototype.indexOf() methods

let originalArray = [1, 2, 3, 4, 1, 2, 3, 4]

let uniqueArray = originalArray.filter((item, index, array) => {
  return array.indexOf(item) === index
})

// uniqueArray === [1, 2, 3, 4]

The basic strategy here is to iterate through originalArray and check to see if the index of the item we are currently examining is the same as the index of the item in the originalArray.

Because indexOf returns the first index that it finds for a given value, if it isn't a duplicate value then the index for that item must be the same!

Note that this method is not the most efficient: it executes in quadratic time. So if the arrays you're checking are very large in size, you may want to use a different method.

Another thing worth nothing is that we can use the same method to return only the duplicate values by inverting the comparison:

let duplicateArray = originalArray.filter((item, index, array) => {
  return array.indexOf(item) !== index
})

Using Array.prototype.reduce() & Array.prototype.includes()

let originalArray = [1, 2, 3, 4, 1, 2, 3, 4]

let uniqueArray = originalArray.reduce((unique, item) => {
  unique.includes(item) ? unique : [...unique, item]
}, [])

// uniqueArray === [1, 2, 3, 4]

Here the strategy is to keep a running list of the unique items in our reducer function's 'accumulator' (unique). For each item in originalArray we check to see if the accumulator includes the item under consideration.

  • If it does contain the item, return the accumulator without making any changes, in effect 'skipping over' that item.
  • If it does not contain the item, spread the values in the accumulator into a new array, and add the item under consideration.

Array.prototype.includes returns a boolean value -- true if the value is found in the array, false if not. This boolean value drives our conditional, determining what to do with each value.

I find this approach less intuitive and harder to read, but it works.

Also note that the empty array that is passed in after the reducer function is the starting value for the accumulator, so the first pass through the reduce, unique is an empty array.

⚡ Using the ES6 Set object ⚡

let originalArray = [1, 2, 3, 4, 1, 2, 3, 4]

let uniqueArray = array => [...new Set(array)]

// or

let uniqueArray = Array.from(new Set(originalArray))

// uniqueArray = [1, 2, 3, 4]

This approach harnesses the power of the Set object, introduced in ES6.

Sets are guaranteed to preserve the order of the inserted items, and to only contain unique values. Therefore it is by definition impossible for a set to contain duplicates!

Here we call the Set object's constructor, passing it the array we'd like to construct a Set from. Then, once we've trimmed out all the duplicates and stored the remaining values in our Set, we convert back to an array and return the result.

I've seen some discussion of this approach being a bit less performant if the array under consideration is very large and contains many duplicate values. However, the same discussion found that this approach is very efficient in a scenario where the data has very few duplicates.

Personally I think the conciseness of this last approach is enough of a benefit to warrant using the Set object approach, unless there's a compelling performance reason not to.

Posted on by:

will_devs profile

Will Harris

@will_devs

Developer currently focused on React, TypeScript, and GatsbyJS.

Discussion

markdown guide
 

Surely Set must be faster - the whole thing is an O(N) operation with Set, but it's getting towards O(N**2) with anything that has to scan like includes or filter... (The reduce version is better than O(N**2) as it is only having to test the currently created unique list)

I decided to see just how different it is. In this example on jsperf, you can see that the "reduce" is around 180x slower and the indexOf even worse at almost 380x slower. Also the version of "reduce" I've used is slightly faster than your one shown, because it doesn't make an array on each loop and just modifies the one being created.

 

Thanks for throwing together that jsperf test Mike!

Set does appear to be much faster, which is fantastic because the syntax is so much easier to grok (for me, at least).

 

Yes, nice that good syntax has great performance haha! Not a normal happy thing that I find! (Like the immutability is a nicer syntax but comes with a penalty).

 

Hi, your benchmark is wrong for indexOf where you are taking all duplicate values instead of unique ones (wrong comparison). Those approaches will perform differently depending on duplicates density - reduce is not necessarily faster.

Both versions are O(n^2).

 

The approach below is O(n) and should work without ES6. The issue with some of the approaches above is they are using Array.indexOf or Array.includes, which is basically another loop through the array. You basically take a performance hit when the searched values are located near the end of the array. The only downside to the method below is that it's memory intensive for large arrays. You're essentially giving up memory for performance.

const originalArray = [1, 2, 3, 4, 1, 2, 3, 4];

let key = {};

const uniqueArray = originalArray.filter((item) => {
  if (key[item]) return false;

  key[item] = true;
  return true;
});
 

A similar function using Set is

function withoutDuplicates(inputArray) {
  const trackerSet=new Set();
  return inputArray.filter(
    v => !trackerSet.has(v) && trackerSet.add(v)
  ) // add returns the Set instance, which is truthy. You may wish to add a more explicit truthy return
}

These have the main advantage over Array.from(new Set(inputArray)) in that there's no need to iterate the Set after filling it with values, but I've done a little profiling and my snippet has pretty much the same characteristics and timing as Array.from(new Set(inputArray)) (just a little bit slower), even for very large arrays.

tbh, I think the big win is that they aren't leaning on JS Set's latent insertion-order stability, making the code more WYSIWYG, and more portable since in many other C-style languages Sets don't have that guarantee.

 

Thanks Evan! That's also a great way to approach the problem.

 

Awesome article. First i am seeing sets. Also glad to read comment below that its runtime is better than expected because to me it is more readable. Also i think allot of the array prototype functions can make the code harder to read in the long run. Assuming you know a bit of background on sets, that last example is so concise!

That being said, could you explain how the ... spread operator works with it? Trying to get my head around that at the moment.

 

Thanks James!

Because the Set object implements the iterator property, we can use the ES6 spread syntax to cast the values from the Set into an array.

It really is doing the same thing as the second part of that example, where I called Array.from().

 

Thankyou that makes sense. Spread operator has been a tough one to learn for me. I always feel like i kind if have an idea of what it is doing in one situation but struggle to use it myself.

 

Sets don't come with the same prototype functions as arrays. They have a forEach method, but no map or filter, for example (I think). The spread operator and square braces there are simply extracting the values from the set and turning them into an array.

 

Set works great when you are dealing with primitive values. But if you are working with some list of items (e.g. list of comment authors in this post), you're better off using a key mapping approach:

const mapping = {}
const unique = array.filter((item) => {
  if (item.id in mapping) return false
  mapping[item.id] = true
  return true
})
 

Hi, you are missing return in the reduce approach.

 
 

too much coolness in one function

 
 
 

let originalArray = [1, 2, 3, 4, 1, 2, 3, 4];
for(var i=0,obj={};i<originalArray.length;i++){
obj[originalArray[i]]=null
}
let noDuplicates=Object.keys(obj)

 

Same using .reduce()...

let myArray = [1,2,2,"hello world",4,5,5, {moo: "cow"}];
let myObject = myArray.reduce((acc, x) => {acc[x]=0; return acc}, {});
let noDuplicates = Object.keys(myObject);