DEV Community

Cover image for Phone Number & Email Extractor With Python
Sam
Sam

Posted on • Updated on

Phone Number & Email Extractor With Python

Phone Number & Email Extractor With Python

Imagine you have the boring task of finding every phone number and email address in a long web page or document.

... You manually scroll through the page(s) and it might take a long time. But if you had a program that could search the text in your clipboard for phone numbers and email addresses, you could simply press ctrl-A to select all the text, press ctrl-c to copy it to the clipboard, and then run your program. It could replace the text on the clipboard with just the phone numbers and email addresses it finds. Let's plan this out:

  • Get the text off the clipboard.

  • Find all the phone numbers & email add

  • Paste them onto the clipboard

    Now we can start thinking about how this might work in code. The code will need to do the following:

  • Use the pyperclip module to copy and paste strings.

  • Create two regexes, one for matching phone numbers and the other for matching email addresses.

  • Find all matches, not just the first match, of both regexes.

  • Neatly format the matched strings into a single string to paste.

  • Display some kind of message if no matches were found in the Text.

Table 1: Shorthand Codes for Common Character Classes

Shorthand character class Represents
\d Any numeric digit from 0 to 9.
\D Any character that is not a numeric digit from 0 to 9.
\w Any letter, numeric digit, or the underscore character.(Think of this as matching “word” characters.)
\W Any character that is not a letter, numeric digit, or the underscore character.
\s Any space, tab, or newline character. (Think of this as matching “space” characters.)
\S Any character that is not a space, tab, or newline.

Create a new python3 file, save it (for me it will be PhoneAndEmailFinder.py) and write the following:

import pyperclip, re # Importing the libraries(this case: regex and pyperclip)

# Create phone regex.
phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))? # area code
    (\s|-|\.)? # separator
    (\d{3}) # first 3 digits
    (\s|-|\.) # separator
    (\d{4}) # last 4 digits
    (\s*(ext|x|ext.)\s*(\d{2,5}))? # extension
    )''', re.VERBOSE)

# TODO: Create email regex.
# TODO: Find matches in clipboard text.
# TODO: Copy results to the clipboard.
Enter fullscreen mode Exit fullscreen mode

The TODO comments will be replaced as you write the actual code.

So it's time to explain the previous code; the phone number begins with an optional area code (indicator) with a max digit of 3 (that is,\d{3}) or three digits within parentheses (that is,\(\d{3}\)),
you should have a pipe joining those parts.

The phone number separator character can be a space (\s), hyphen (-), or period (.), so these parts should also be joined by pipes. The next few parts of the regular expression are straightforward: three digits, followed by another separator, followed by four digits. The last part is an optional extension made up of any number of spaces followed by ext, x, or ext., followed by two to five digits.

You will also need a regular expression that can match email addresses.
Make your program look like the following:

# Create email regex.
emailRegex = re.compile(r'''(
    [a-zA-Z0-9._%+-] + #username
    @                   # @symbole
    [a-zA-Z0-9.-] +     # domain
    (\.[a-zA-Z]{2,4})   # dot-something
    )''', re.VERBOSE)
Enter fullscreen mode Exit fullscreen mode

Step 3: Find All Matches in the Clipboard Text

# Find matches in the clipboard text.
text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
    phoneNum = '-'.join([groups[1], groups[3], groups[5]])
    if groups[8] != '':
        phoneNum += ' x' + groups[8]
    matches.append(phoneNum)
for groups in emailRegex.findall(text):
    matches.append(groups[0])
Enter fullscreen mode Exit fullscreen mode

There is one tuple1for each match, and each tuple contains strings for each group in the regular expression.

Step 4: Join the Matches into a String for the Clipboard
Now that you have the email addresses and phone numbers as a list of strings in matches, you want to put them on the clipboard. The pyperclip.copy() function takes only a single string value, not a list of strings, so you call the join() method on matches.
To make it easier to see that the program is working, let’s print any matches you find to the terminal. And if no phone numbers or email addresses were found, the program should tell the user this. Make your program look like the following:

if len(matches) > 0:
    pyperclip.copy('\n'.join(matches))
    print('Copied to clipboard: ')
    print('\n'.join(matches))
# TODO: Pasting the content --> txt file.
    s = pyperclip.paste() 
    with open('phone&emailfinder.txt','w') as g:
        g.write(s)
    g.close()
else:
    print('No phone numbers or email addresses found.')
Enter fullscreen mode Exit fullscreen mode

(as bonus it will automatically store it in a txt file)


  1. a data structure consisting of multiple parts. 

Top comments (0)