Let's check, many times (or few) arises the need to remove duplicate elements given in an array, I don't know... it can be because you have to print a list from the super, remove a student that duplicated his record in a form, infinity of things, so let's see some ways to do this:
1) Use Set
Using Set(), an instance of unique values will be created, implicitly using this instance will delete the duplicates.
So we can make use of this instance and from there we will have to convert that instance into a new array, and that would be it:
let chars = ['A', 'B', 'A', 'C', 'B'];
let uniqueChars = [...new Set(chars)];
console.log(uniqueChars);
Output:
['A', 'B', 'C']
2) Using the indexOf() and filter() methods
The indexOf() method returns the index of the first occurrence of the element in the array:
let chars = ['A', 'B', 'A', 'C', 'B'];
chars.indexOf('B');
Output:
1
The duplicate element is the element whose index is different from its indexOf() value:
let chars = ['A', 'B', 'A', 'C', 'B'];
chars.forEach((element, index) => {
console.log(`${element} - ${index} - ${chars.indexOf(element)}`);
});
Output:
A - 0 - 0
B - 1 - 1
A - 2 - 0
C - 3 - 3
B - 4 - 1
To eliminate duplicates, the filter() method is used to include only the elements whose indexes match their indexOf values, since we know that the filer method returns a new array based on the operations performed on it:
let chars = ['A', 'B', 'A', 'C', 'B'];
let uniqueChars = chars.filter((element, index) => {
return chars.indexOf(element) === index;
});
console.log(uniqueChars);
Output:
['A', 'B', 'C']
And if by chance we need the duplicates, we can modify our function a little, just by changing our rule:
let chars = ['A', 'B', 'A', 'C', 'B'];
let dupChars = chars.filter((element, index) => {
return chars.indexOf(element) !== index;
});
console.log(dupChars);
Output:
['A', 'B']
3) Using the includes() and forEach() methods
The include() function returns true if an element is in an array or false if it is not.
The following example iterates over the elements of an array and adds to a new array only the elements that are not already there:
let chars = ['A', 'B', 'A', 'C', 'B'];
let uniqueChars = [];
chars.forEach((element) => {
if (!uniqueChars.includes(element)) {
uniqueChars.push(element);
}
});
console.log(uniqueChars);
Output:
['A', 'B', 'C']
Basically, we have options to solve this type of problem, so don't get stuck anymore and you can use whichever one appeals to you.
Top comments (26)
The first one is sexier
The last two are problematic because you are essentially calling a for loop in a for loop which heavily increases how long the algorithms are going to take.
Using a set to remove duplicates is a great to solve this problem.
How does Set() do it?
The internal implementation of a Set is usually based on a Hash Table. A hash table is a data structure that can be quickly located into a bucket by converting keys into indexes, enabling fast lookup operations.
If the data is in complexity type like Object, Set is probably not a good way
Don't forget reduce!
chars.reduce((acc, char) => acc.includes(char) ? acc : [...acc, char], []);
This is my preferred way,. I don't like using Sets
Or
let chars = ['A', 'B', 'A', 'C', 'B'];
let uniqueChars = [];
chars.forEach((e) => {
if (!(e in chars)) {
uniqueChars.push(e);
}
});
console.log(uniqueChars);
Shouldn't that be
if (!(e in uniqueChars)) {
?When deduplicating array elements in Vue, you need to consider whether the array itself is responsive.
Handling duplicate elements in a large dataset with an array involves various strategies, such as chunk processing and stream processing, depending on whether the entire dataset can be loaded into memory at once. Here's a structured approach:
Chunk Processing:
1.Chunk Loading: Load the massive dataset in manageable chunks, such as processing 1000 records at a time, especially useful for file-based or network data retrieval.
2.Local Deduplication with Hashing: Use a hash table (like a Map or a plain object) to locally deduplicate each chunk.
3.Merge Deduplicated Results: Combine the deduplicated results from each chunk.
4.Return Final Deduplicated Array: Return the overall deduplicated array after processing all chunks.
Considerations:
1.Performance: Chunk processing reduces memory usage and maintains reasonable time complexity for deduplication operations within each chunk.
2.Hash Collisions: In scenarios with extremely large datasets, hash collisions may occur. Consider using more sophisticated hashing techniques or combining with other methods to address this.
Stream Processing:
Stream processing is suitable for real-time data generation or situations where data is accessed via iterators. It avoids loading the entire dataset into memory at once.
Example Pseudocode:
This generator function (deduplicateStream) yields deduplicated elements as they are processed, ensuring efficient handling of large-scale data without risking memory overflow.
In summary, chunk processing and stream processing are two effective methods for deduplicating complex arrays with massive datasets. The choice between these methods depends on the data source and processing requirements, requiring adjustment and optimization based on practical scenarios to ensure desired memory usage and performance outcomes.
For big array of objects, use reduce and Map:
[{a: 1, b: 2}, {a: 2, b: 3}, {a: 1, b: 2}].reduce((p, c) => p.set(c.a, c), new Map()).values()
While the code is effective in its operation, it can be difficult to understand for new developers. It can be confusing to use map and reduce together. It is often more understandable to achieve the same result with a simpler solution
Complexities are:
2 and 3 are O(N^2)
So 1 should always be used imho.
This is great for most situations, using a Map -- my personal testing has found that arrays up to a certain length still perform better with reduce, etc than Maps, but beyond N values (which I can't recall the exact amount and I'm sure it varies on the types), Map's absolutely crush them because the op is O(n), as noted by DevMan -- just thought it was worth noting.
Thanks for the article. This is useful.
Imagine having array of objects with n props having only to remove duplicate props with m same properties where m < n and use first of the two or more duplicates of the same unique constraint rule.
That's where science begins. Would be much more interested in hearing different solutions to this topic.
Then write your own article an stop trying to sound smart on someone else's post
Simply use reduce and Map:
[{a: 1, b: 2, c: 3}, {a: 2, b: 3, c: 4}, {a: 1, b: 2, c: 5}].reduce((p, c) => p.set([c.a, c.b].join('|'), c), new Map()).values()
Edit: For selecting the first value, use Map.has before setting the value.
I love the first one, but I wish it had an "equality function" to be provided, as you can't use it deduplicate array of objects (usually you would check for IDs being the same and get rid off the duplicates based on that).