Find common typos between words

#automation #python

Project 1: Ideas for Similar Programs

Welcome to the second project in this tutorial series. In this project, you are about to tackle four exciting problems. As always, you will be provided with examples and exercises to work on.

Your goal in this project is to write more complex regular expressions to search, substitute, and manipulate text characters in strings.

For access to the project README, click here.

Prerequisites

Part 1 in the series

Before you start this project, you should roll up your sleeves and get some hands-on coding experience. Project A serves as a great warm-up, and it's quite similar to what we tackled in the previous project. Once you have completed it, click here to compare your solution. You will find all the necessary text files here.

Now that we have that clear, let us work on this.

Project D: Find common typos such as multiple spaces between words, accidentally accidentally repeated words, or multiple exclamation marks at the end of sentences. Those are annoying!!

Step 1: Write Comments
First, define a function called remove_typos_in_word().
Second, write comments to explain each step.
Third, call the function.

def remove_typos_in_word():
    # get the text off the clipboard

    # Define a regex pattern for multiple spaces between words
    # substitute multiple spaces with just a space

    # Define a regex pattern for accidentally repeated words
    # substitute it with just one word

    # Define a regex pattern for multiple exclamation marks at the end of sentences
    # substitute it with one exclamation mark
    # return the replaced text
    pass


if __name__ == '__main__':
    remove_typos_in_word()

Step 2: Import Necessary Modules
You already used these modules in the previous project, If you need a quick reminder, click here.

import re, pyperclip

Step 3: Paste Text from Clipboard

def remove_typos_in_word():
    copied_text = pyperclip.paste()

Step 4: Create a Regex Pattern for Multiple Spaces Between Words

# Define a regex pattern for multiple spaces between words
multiple_spaces_pattern = re.compile(r'\s{2,}')

Step 5: Substitute Multiple Spaces with a Single Space and Store It in a replaced_text Variable.
Note: This regex can be improved. It seems to also remove new lines.

To understand how re.sub() works. Just hover your mouse at the function, and you will see the parameter it takes. It takes in a pattern, a replacement, and a string. The others are optional.

Run your code to ensure that it works properly.

replaced_text = re.sub(multiple_spaces_pattern, ' ', copied_text)
print(replaced_text)

Step 6: Create Regex Patterns for the Remaining Two
You might notice that we are not using the copied_text variable anymore but rather accessing the replaced_text. Can you guess why?

The reason for this is that we want to continue working on the corrected text to apply additional regex patterns. This way, each correction doesn't interfere with the next one.

    # Define a regex pattern for accidentally repeated words
    repeated_words_pattern = re.compile(r'(\b\w+\b)(\s+\1)+')
    replaced_text = re.sub(repeated_words_pattern, r'\1', replaced_text)
    print(replaced_text)

    # Define a regex pattern for multiple exclamation marks at the end of sentences
    multiple_exclamations_pattern = re.compile(r'!{2,}')
    replaced_text = re.sub(multiple_exclamations_pattern, '!', replaced_text)
    print(replaced_text)

Step 7: The End Result
This is the final form of the code. You return the replaced_text variable, which now contains all the corrections. To access the result, you call the function and assign the returned value to the replaced_text variable. Then, you copy it to the clipboard and print it in the terminal.

import re, pyperclip

def remove_typos_in_word():
    copied_text = pyperclip.paste()

    # Define a regex pattern for multiple spaces between words
    multiple_spaces_pattern = re.compile(r'\s{2,}')
    # multiple_spaces_pattern = re.compile(r'(?<!\n)\s{2,}')
    replaced_text = re.sub(multiple_spaces_pattern, ' ', copied_text)
    # print(replaced_text)

    # Define a regex pattern for accidentally repeated words
    repeated_words_pattern = re.compile(r'(\b\w+\b)(\s+\1)+')
    replaced_text = re.sub(repeated_words_pattern, r'\1', replaced_text)
    # print(replaced_text)

    # Define a regex pattern for multiple exclamation marks at the end of sentences
    multiple_exclamations_pattern = re.compile(r'!{2,}')
    replaced_text = re.sub(multiple_exclamations_pattern, '!', replaced_text)
    # print(replaced_text)
    return replaced_text

if __name__ == '__main__':
    replaced_text = remove_typos_in_word()
    pyperclip.copy(replaced_text)
    print('Copied to clipboard.')
    print(replaced_text)

Exercise

Now, it's your turn to tackle parts B and C in the "Ideas for Similar Programs." Once you have completed them, you can compare your solutions by clicking here.

Note: It's often more helpful to attempt the exercise before seeking a solution. Jumping straight to a solution won't help you learn. Best of luck.

Conclusion

In this project, you created more complex regex patterns and implemented a search and replace functionality. Think about all the times you have encountered search and replace functionality; it's often powered by regex. Great work, and keep on learning.

If you have any questions, want to connect, or just fancy a chat, feel free to reach out to me on LinkedIn and Twitter.