Tib

Posted on Mar 16, 2021

De-duplicate arrays in Python, Perl and Ruby

#perl #python #ruby #programming

Today some short codes for the essential task that is de-duplication! 😄

Python

Starting with python, we use the properties of data containers. Here using an intermediate dictionary:

array = [1, 2, 1, 2, 1, 2]
array = list(dict.fromkeys(array))

Or similar approach using a set:

array = [1, 2, 1, 2, 1, 2]
array = list(set(array))

The elegant uniq method:

my @dups = (1, 2, 1, 2, 1, 2);
@nodup = uniq @dups;

You have to install and use List::MoreUtils which is a VERY famous Perl CPAN module.

But if you want to do not use a module, here is my "go-to" (no-module) trick:

my @dups = (1, 2, 1, 2, 1, 2);
my @nodup = do { my %seen; grep { !$seen{$_}++ } @dups };

The idea behind is like the first Python (intermediate hash).

Ruby takes adventage of the "almost everything in ruby is an object" so it's simple with the uniq method:

array = [1, 2, 1, 2, 1, 2]
nodup = array.uniq

Please comment yours 😄 in Python, Perl, Ruby or any other language!

As of perl 5.20 you can use a hash slice:

my @dups=(1,2,1,2,1,2);
my @nodups = do { my %h; @h{ @dups } = (1) x @dups; keys %h };

Add a sort before keys if you need them ordered.

I use hash way in Perl because this is O(1) instead of O(n)

How is this O(1)? You are going to go through every element in @dups at least once.

Sorry. I misunderstood.

uniq is now available from List::Util 1.45 or newer, which came with Perl 5.26 or can be updated from CPAN.