If coding tutorials with math examples are the bane of your existence, keep reading. This series uses relatable examples like dogs and cats.
Regular Expression (RegEx)
Regular Expression is often referred to as RegEx or regex. Regex is used to search for and find patterns. Regular expressions are a sequence of characters used to define search patterns.
If you've ever used ctrl+f
to find
or find + replace
, you've used regex.
Take a look at this screenshot. I have searched for every instance of ' dog' in the file. Including the space at thee beginning made sure we only got words that start with 'dog'. It ignores everything after the search sequence as I didn't desire to see that. In this case I could use replace
to use 'canine' or 'snuggle bug' instead of 'dog'.
To use RegEx, start by importing module re
import re
Common Functions
There are quite a few functions in the regex
module. I'll dig into a few of the more common ones.
re.match()
Returns only the first thing that matches at the beginning of the first line of a string
# syntax
re.match(pattern, string, flags=0)
# example
>>> import re
>>> dating_site_hobbies = "long walks on the beach and dogs"
>>> match = re.match("long walks", dating_site_hobbies, re.I) # re.I makes the seach case insensitive
>>> print(match)
<re.Match object; span=(0, 10), match='long walks'>
# fails because it's not the beginning of the string
>>> match_fail = re.match("dogs", dating_site_hobbies, re.I)
>>> print(match_fail)
None
In this case, if I was searching for similar likes on a dating site, I wouldn't care if the word was the first thing in the string of hobbies.
re.search()
Returns anything that matches, no matter where it is in a string
# syntax
re.search(pattern, string, flags=0)
# example
>>> import re
>>> dating_site_hobbies = "long walks on the beach and dogs"
>>> search = re.search("dogs", dating_site_hobbies, re.I) # re.I makes the seach case insensitive
>>> print(search)
<re.Match object; span=(28, 32), match='dogs'> # span is telling us what characters the search term is at
# fails because the search term isn't in the hobbies
>>> search_fail = re.search("cats", dating_site_hobbies, re.I)
>>> print(search_fail)
None
re.findall()
Returns list with all matches
# syntax
re.findall(pattern, string, flags=0)
# example
>>> import re
>>> research_paper_submission = "all the things that were written down to show that I know what I was learning in school"
>>> matches = re.findall("that", research_paper_submission, re.I) # re.I makes the seach case insensitive
>>> print(matches)
['that', 'that']
# fails because the search term isn't in the paper
>>> matches_fail = re.findall("dog", research_paper_submission, re.I)
>>> print(matches_fail)
[] # prints empty list of no matches because the paper wasn't about dogs
re.sub()
Replaces one or many matches with a string
Like ctrl+f
then replace
In school, I had a professor suggest I eliminate the use of the word 'that'. To do this after I've written my paper, I could use the following code.
# syntax
re.sub(pattern, repl, string, count=0 flags=0)
# example
>>> import re
>>> research_paper_submission = "the things that were written down to show that I know what I was learning at school"
>>> matches = re.sub('that', '', research_paper_submission)
>>> print(matches)
the things were written down to show I know what I was learning at school
MetaCharacters
https://www.programiz.com/python-programming/regex
MetaCharacter | What it does | Example |
---|---|---|
[] |
check all characters |
[act] matches 'cat' and 'tack', but not 'dog' |
. |
wildcard |
... matches 'dog' and 'mouse', but not 'on' |
^ |
starts with |
^cat matches 'catastrophic' but not 'locate' |
$ |
ends with |
cat$ matches 'bobcat' but not 'catch' |
` | ` | or |
NOTE: if you know how to get pipes, |
, to show in a table, please let me know. The last row of this table is messed up because I don't know how to make it work.
I highly suggest Regex One for practicing your understanding of regex.
Some reference stuff
As always, refer to the Python docs for more detailed information
Here's a cheat sheet worth bookmarking and another thing worth looking at.
Series loosely based on
Top comments (0)