DEV Community

Cover image for A simple script to extract email addresses
Eddie Prislac for Vets Who Code

Posted on

A simple script to extract email addresses

I'm a big fan of automating processes... if I need a long command or series of commands to accomplish a task more than once, my instinct is to take that command and turn it into an easy to remember, re-usable script. My task today was to take an exported set of applications I receive over the apply-forms channel in the VWC slack group and grab all the email addresses out of it, so I could perform one single mass mailing.

The problem:

VWC has a channel to which every application gets piped in our Slack group, called 'apply-forms'. We also have a slack app called 'export' that will export every message from a given slack channel between two given dates to JSON format... pretty handy when you get behind on answering applications. However, the JSON that it spits out is relevant to generic messages, not messages that are broken down into what could be a further JSON object... in other words, the key-value pairs from each application that render nicely in the 'apply-forms' channel all come out as one long string. This means, were I a more patient man, I'd go through each of those by hand and manually cut-and-paste each email address individually into a separate form. However, anyone who's ever read one of my posts before or dealt with me in person knows that I am most definitely not a patient man, nor am I a man that has an abundance of free time. Screw that noise... how do we write a script to do this instead?

The solution:
Grep.
Yep, grep.
The following snippet proved more than able to accomplish exactly the task I needed done, in a little under a second:


grep -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' slack-export-privategroup-2020-04-21T1603Z.json | sort | uniq -i >> vwc-emails.txt.  

In this example, slack-export-privategroup-2020-04-21T1603Z.json is the file containing my exported JSON, and vwc-emails is the file into which I'm exporting my email addresses.

This bit:


grep -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' slack-export-privategroup-2020-04-21T1603Z.json   

is my grep command, in which I tell grep to output the results of using the included regex command on the aforementioned input file. We then pipe (|) that output into sort and uniq to get them alphabetized and remove duplicates... the >> at the end just before the output file tells zsh to push that output into vwc-emails.txt.

Pretty neat, huh? It is, yeah, and my end result was a nicely ordered one-to-a-line file of every unique email address, ready to be copy/pasted into the 'to' line of a new email.

But it's waaaay too damned long to remember.

How to fix that? Good question... the answer is "put that shit in an executable".

Whenever I have a long command or chain of commands such as this one, I find it easiest to create a file for the command and place it in the bin directory of my home folder, which I have already conveniently mapped to my $PATH. I give the file the name snatchmail, open it up in vim, and paste my command.


grep -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' slack-export-privategroup-2020-04-21T1603Z.json | sort | uniq -i >> vwc-emails.txt   

Not quite good enough. For one, the shell will not know which interpreter I'm using, so we need a shebang line (okay, so zsh would just execute the grep command anyways without it, but a shebang can also help to remind you which language you're using, and which interpreter will be used, in case you need to come back to it):


#!/bin/zsh

grep -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' slack-export-privategroup-2020-04-21T1603Z.json | sort | uniq -i >> vwc-emails.txt   

Still not quite there, though.

See, my filenames will not always be the same... export will always name the new file with the current date and time, and I don't want to just keep adding email addresses to the same file.

Luckily, zsh makes this stupid-simple:


#!/bin/zsh

grep -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' $1 | sort | uniq -i >> $2

I've here replaced my filenames with two numbered params, $1 and $2, with $1 being my input file, and $2 being my output file. I can now exit vim, reload my shell, and execute the following command:


snatchmail slack-export-privategroup-2020-04-21T1603Z.json  vwc-emails2.txt   

And voila: my command executes, and I now have the same result as that long grep chain, by entering one command with an input and output filename.

I enjoy writing scripts like this. It appeals to my natural affinity for problem solving, and allows me to save loads of time. I hope you will take inspiration from this post and experiment with your own time-saving scripts, and I encourage you to post any you come up with in the comments below. Thanks for reading!

Cover image "Software Development Community" by Michael Kappel is licensed under CC BY-NC 2.0

References:

Top comments (1)

Collapse
 
victorkr profile image
victorkr

There is the free online application Aspose Email Extractor products.aspose.app/email/extractor. The application allows extract emails from many popular formats, e.g., PDF, DOC, XLS, etc. It analyzes documents' content and metadata too. It allows uploading multiple documents, zipped sites, etc.