DEV Community

Dilip Kumar Singh
Dilip Kumar Singh

Posted on • Originally published at codefari.com

MongoDB Query: Remove duplicate records from the collection except one

Codefari.com
var duplicatesIds = [];
db.Employee.aggregate([
{
$group: {
_id: {
EmpId: "$EmpId"
},
dups: {
"$addToSet": "$_id"
},
count: {
"$sum": 1
}
}
}, {
$match: {
count: {
"$gt": 1
}
}
}
], {
allowDiskUse: true
}).forEach(function (doc) {
doc.dups.shift();
doc.dups.forEach(function (dupId) {
duplicatesIds.push(dupId);
})
});
printjson(duplicatesIds);

db.Employee.remove({_id:{$in:duplicatesIds}})
db.Employee.find();

Now we will do an analysis of the above-written query.
1- var duplicatesIds = []: This is an array declaration where this query will push the duplicate IDs.

2-{$group:{_id:{EmpId:"$EmpId"},dups:{"$addToSet":"$_id"} ,count:{"$sum":1}}}: Here we are grouping the records on behalf of EmpId, and using $addToSet command, we can create an array "dups", and count:{"$sum":1} is counting the duplicate records.

3- {$match:{count:{"$gt":1}}}: Here we are filtering the records that have a count greater than 1. As the above group pipeline, we are counting the duplicate records on behalf of EmpId.

4- ForEach: we are iterating records one by one here which are grouped EmpId, here we will find the array of duplicate records, for example
"dups" : [
ObjectId("5e5f5d20cad2677f9f839327"),
ObjectId("5e5f5d27cad2677f9f839328"),
ObjectId("5e5f5cf8cad2677f9f839323")
].

5- doc.dups.shift():Here we are removing one record which will not be deleted, and It means we will delete the duplicates except one document.

6- doc.dups.forEach(function (dupId): here again, we are iterating the array to push (duplicatesIds.push(dupId)) it records (duplicatesIds)on the above-declared array.

7- db.Employee.find(): to fetch the records.
Now finally execute the above MongoDB query, and you will find the following records.

For more details follow the codefari.com

Top comments (0)