DEV Community

Cover image for Charming the Python: RegEx
Vicki Langer
Vicki Langer

Posted on

Charming the Python: RegEx

If coding tutorials with math examples are the bane of your existence, keep reading. This series uses relatable examples like dogs and cats.


Regular Expression (RegEx)

Regular Expression is often referred to as RegEx or regex. Regex is used to search for and find patterns. Regular expressions are a sequence of characters used to define search patterns.

If you've ever used ctrl+f to find or find + replace, you've used regex.

Take a look at this screenshot. I have searched for every instance of ' dog' in the file. Including the space at thee beginning made sure we only got words that start with 'dog'. It ignores everything after the search sequence as I didn't desire to see that. In this case I could use replace to use 'canine' or 'snuggle bug' instead of 'dog'.

a short paragraph about dogs and using the find and replace option to find and highlight all instances of the sequence ' dog'


To use RegEx, start by importing module re

import re
Enter fullscreen mode Exit fullscreen mode

Common Functions

There are quite a few functions in the regex module. I'll dig into a few of the more common ones.

re.match()

Returns only the first thing that matches at the beginning of the first line of a string

# syntax
re.match(pattern, string, flags=0)

# example
>>> import re

>>> dating_site_hobbies = "long walks on the beach and dogs"
>>> match = re.match("long walks", dating_site_hobbies, re.I)  # re.I makes the seach case insensitive
>>> print(match)
<re.Match object; span=(0, 10), match='long walks'>

# fails because it's not the beginning of the string
>>> match_fail = re.match("dogs", dating_site_hobbies, re.I)
>>> print(match_fail)
None
Enter fullscreen mode Exit fullscreen mode

In this case, if I was searching for similar likes on a dating site, I wouldn't care if the word was the first thing in the string of hobbies.

re.search()

Returns anything that matches, no matter where it is in a string

# syntax
re.search(pattern, string, flags=0)

# example
>>> import re

>>> dating_site_hobbies = "long walks on the beach and dogs"
>>> search = re.search("dogs", dating_site_hobbies, re.I)  # re.I makes the seach case insensitive
>>> print(search)
<re.Match object; span=(28, 32), match='dogs'> # span is telling us what characters the search term is at

# fails because the search term isn't in the hobbies
>>> search_fail = re.search("cats", dating_site_hobbies, re.I)
>>> print(search_fail)
None
Enter fullscreen mode Exit fullscreen mode

re.findall()

Returns list with all matches

# syntax
re.findall(pattern, string, flags=0)

# example
>>> import re

>>> research_paper_submission = "all the things that were written down to show that I know what I was learning in school"
>>> matches = re.findall("that", research_paper_submission, re.I)  # re.I makes the seach case insensitive

>>> print(matches)
['that', 'that']

# fails because the search term isn't in the paper
>>> matches_fail = re.findall("dog", research_paper_submission, re.I)
>>> print(matches_fail)
[] # prints empty list of no matches because the paper wasn't about dogs
Enter fullscreen mode Exit fullscreen mode

re.sub()

Replaces one or many matches with a string
Like ctrl+f then replace

In school, I had a professor suggest I eliminate the use of the word 'that'. To do this after I've written my paper, I could use the following code.

# syntax
re.sub(pattern, repl, string, count=0 flags=0)

# example
>>> import re

>>> research_paper_submission = "the things that were written down to show that I know what I was learning at school"
>>> matches = re.sub('that', '', research_paper_submission)

>>> print(matches)
the things  were written down to show  I know what I was learning at school
Enter fullscreen mode Exit fullscreen mode

MetaCharacters

https://www.programiz.com/python-programming/regex

MetaCharacter What it does Example
[] check all characters [act] matches 'cat' and 'tack', but not 'dog'
. wildcard ... matches 'dog' and 'mouse', but not 'on'
^ starts with ^cat matches 'catastrophic' but not 'locate'
$ ends with cat$ matches 'bobcat' but not 'catch'
` ` or

NOTE: if you know how to get pipes, |, to show in a table, please let me know. The last row of this table is messed up because I don't know how to make it work.

I highly suggest Regex One for practicing your understanding of regex.


Some reference stuff

As always, refer to the Python docs for more detailed information

Here's a cheat sheet worth bookmarking and another thing worth looking at.


Series loosely based on

Top comments (0)