Originally posted on my-blog. Check it out for more up to date content
Let's begin by defining a simple array:
const cars = [
'Mazda',
'Ford',
'Renault',
'Opel',
'Mazda'
]
As you can see, the first and the last item are the same. Finding this duplicate is straightforward considering that we have an array of items that are a primitive type. To achieve this, we can simply use the method filter
along with the indexOf
method within the provided callback.
const unique = cars.filter((car, idx) => cars.indexOf(car) === idx);
console.log(unique); // outputs ['Mazda', 'Ford', 'Renault', 'Opel']
Note that the indexOf
method will return the first occurrence of an item within the array. This is why we can compare the index returned by the indexOf
method with the current index in each iteration to see if the current item is a duplicate.
Finding Object Duplicates
This is the tricky part. Objects are compared via reference rather than the value or structure. This means that if we compare two objects that are exactly the same they won't match. We cannot simply do something like obj1 === obj2
because of the way they are compared.
const obj1 = {
name: 'John',
surname: 'Doe'
}
const obj2 = {
name: 'John',
surname: 'Doe'
}
const match = obj1 === obj2;
console.log(match) // outputs false
Now, what if we have an array of object duplicates? How are we going to filter out those? Considering what we've just read, it's not possible to use something simple like indexOf
.
Example array:
const names = [{
name: 'John',
surname: 'Doe'
}, {
name: 'Muhamed',
surname: 'Ali'
}, {
name: 'Mike',
surname: 'Tyson'
}, {
name: 'John',
surname: 'Doe'
}, {
name: 'John',
surname: 'Doe'
}, {
name: 'Mike',
surname: 'Tyson'
}, {
name: 'Mike',
surname: 'Tyson'
}];
As you can see, we have a couple of duplicate items. Let's implement the function that will find the duplicates.
The Longer Version
In this approach, we will manually loop through the source array (forEach
method) and check if each item exists in the resulting array by using the find
method.
Considering that we have an array of objects, we must compare each property of the current object in order to make sure that the items are the same. Broken down into steps the process looks like this:
- Get the object properties
- Define the resulting arrays (
unique
andduplicates
) - Loop through the source array
- Try to locate the current item within the
unique
array - If the item is found, push it into the
duplicates
otherwise into theunique
array
const findDuplicates = (source) => {
const keys = Object.keys(source[0]);
let unique = [], duplicates = [];
source.forEach((item, idx) => {
if(idx == 0) {
unique.push(item);
return;
};
const resultItem = unique.find(resultItem => {
let notFound = true;
keys.forEach(key => {
notFound = notFound &&
item[key] != resultItem[key];
});
return !notFound;
});
(!resultItem ? unique : duplicates).push(item);
});
return { unique: unique, duplicates: duplicates };
};
const result = findDuplicates(names);
console.log(result.unique, result.duplicates);
// expected output
// unique items
// 0: {name: "John", surname: "Doe"}
// 1: {name: "Muhamed", surname: "Ali"}
// 2: {name: "Mike", surname: "Tyson"}
// duplicate items
// 0: {name: "John", surname: "Doe"}
// 1: {name: "John", surname: "Doe"}
// 2: {name: "Mike", surname: "Tyson"}
// 3: {name: "Mike", surname: "Tyson"}
A Bit Shorter Version
We could use the reduce
method in order to achieve the same thing. This is a very powerful method and it can be used to transform the array into the desired result. It accepts a callback as a parameter which is executed for each item in the array. The return value of the callback is the given accumulator modified within each iteration. Considering that this is not an article about the reduce
method, check out the official MDN documentation
Ok, back to our code. The modified version of the findDuplicates
method looks like this:
const findDuplicates = (source) => {
const keys = Object.keys(source[0]);
return source.reduce((acc, item) => {
const resultItem = acc.unique.find(x => {
let notFound = true;
keys.forEach(key => {
notFound = notFound &&
item[key] != x[key];
});
return !notFound;
});
(!resultItem ? acc.unique : acc.duplicates).push(item);
return acc;
}, {
unique: [],
duplicates: []
})
};
The modified version should return the same resulting arrays as it did before.
// unique items
// 0: {name: "John", surname: "Doe"}
// 1: {name: "Muhamed", surname: "Ali"}
// 2: {name: "Mike", surname: "Tyson"}
// duplicate items
// 0: {name: "John", surname: "Doe"}
// 1: {name: "John", surname: "Doe"}
// 2: {name: "Mike", surname: "Tyson"}
// 3: {name: "Mike", surname: "Tyson"}
That's it. Thank you for reading and see you in the next article.
Further Reading
See this cheat sheet that will guide you through the most common use cases when it comes to the array manipulation.
Top comments (20)
there is also this:
const arrayWithoutDuplicates = Array.from(new Set(arrayWithDuplicates))
Same question @Eshun Sharma :) This is the reason why the title says "tricky"
Have tried set to find uniques in a simple array and it works great, but would this work if its a nested array, an array of objects or multi dimensional arrays?
since arrays and objects are reference type it may work in some cases, ex:
but if you want an array of objects with the same structure it wont work:
if you want to compare structures it will mostly depend on the kind of array you have, for example an array of database documents may be filtered based on _id or name:
in most cases the arrays items will have a predetermined format, it's very rare to need multi purpose duplicate remover, at least i didn't need one till now, it also may do some checks that aren't really necessary for what you want to do, which may be bad for performance, but may be helpful to speed up development.
This is interesting, passing objects with same reference vs passing same objects with different references.
Thanks for the explanation :)
By visiting all the keys for each element in the source you are making an algorithm of O(nm). Also if the arrays are long enough it will miss the CPU cache for each source item. Its an efficient way basically.
An alternative is to consume more memory and keep the unique elements in a map and going trough the source only once. This way the checks are O(1).
If the elements are objects the memory overhead will be small as you store references.
There's another article discuss the equality of two js objects: One language uses the concept of object equivalence to compare data structures, while the other checks explicitly for object identity.dev.to/annarankin/equality-of-data...
I find this quite difficult to read with the reuse of names like
resultItem
and the double-negation of things like!notFound
. I think the first example is better (the one without thereduce()
, because it's more readable.Aren't you deferring the problem, though? You've moved the comparison to checking the properties of an object for equality, but if those properties are also objects... you're back to square one. So you'd need to recurse, and do a deep comparison, which is expensive and has to have some compromises of its own (like picking a max depth or facing what to do if there's recursion in the object itself).
Yes, I see your point about the double negation, I admit it should be done the other way around to improve readability, especially due to the reason that this post was written to show what is going on under the hood while finding duplicates.
About the recursion, the assumption for the given example was that we have only one level. As you said, a deep comparison is a thing of its own which wasn't the focus here.
If you are interested, take a look at the same post on devinduct.com and see Scott Sauyet comment. It's a quite interesting way to do the same thing.
I'd just use JSON.stringify(obj) when comparing objects. Of course it all depends on the things you're trying to compare, but that is really the most simple way if you have a structured data set. Perhaps a performance expert can tell us if this method is expensive though!
I'm no expert, but you will loose some item in your array (every non serializable like functions), it will fail on non utf8 encoded string, throws on circular reference and order will matter.
Overall, JSON is not a great way to compare complex object or array.
Agreed, it's not great in most cases. However, sometimes it is just practical from a dev productivity point of view:
If you use it, wrap it in your own compareXYZ() util-function, so you can
a) adjust it later
b) see that function in your profiler pop up
Wiltel49#2019 Updated!!! Great article about Arrays Both Unique and Duplicate copy. Well done!!!
Thank you!
U are welcome my Brother!!! Gitter done. Wiltel49#2019 AAS ITI MICHIGAN!!!
My solution was to use
_.isEqual
from "lodash"lodash.com/docs/4.17.11#isEqual
Or just use a Set and array.from ?
The "tricky" part is related to object duplicates. I'm not sure this approach would work with an array populated with non-primitive values
Damn, if I had this article when I was struggling with arrays my motivation would have no end 😁 really great!
Thanks mate 👍