Introduction
Let's go over the two methods, re.sub()
and re.match()
from Python's re
module with examples.
1. re.sub()
:
The re.sub()
function is used for substituting occurrences of a pattern in a string. It takes three main arguments:
- The pattern you want to replace (a regular expression).
- The replacement string (what you want to replace it with).
- The original string in which you want to replace the occurrences of the pattern.
Syntax:
re.sub(pattern, replacement, string, count=0, flags=0)
-
pattern
: The regex pattern to search for. -
replacement
: The string to replace the matched pattern. -
string
: The input string where the replacement will occur. -
count
: (Optional) Limits the number of replacements. By default, all occurrences are replaced. -
flags
: (Optional) Allows modification of matching behavior (like case-insensitivity).
Example:
Let's replace all the digits in a string with the word NUM
.
import re
text = "The price is 123 dollars and 45 cents."
new_text = re.sub(r'\d+', 'NUM', text)
print(new_text)
Output:
The price is NUM dollars and NUM cents.
Here, \d+
is the regex pattern that matches one or more digits. The re.sub()
function replaces all occurrences of this pattern with the string 'NUM'
.
2. re.match()
:
The re.match()
function checks for a match only at the beginning of the string. If the match is found at the start of the string, it returns a match object. Otherwise, it returns None
.
Syntax:
re.match(pattern, string, flags=0)
-
pattern
: The regex pattern to match. -
string
: The input string to be searched. -
flags
: (Optional) Allows modification of matching behavior.
Example:
Let's check if a string starts with a word followed by numbers.
import re
text = "Price123 is the total cost."
match = re.match(r'\w+\d+', text)
if match:
print(f"Matched: {match.group()}")
else:
print("No match found")
Output:
Matched: Price123
Here, \w+
matches one or more word characters (letters, digits, and underscores), and \d+
matches one or more digits. Since the string starts with "Price123"
, it successfully matches and prints it.
Key Differences:
-
re.sub()
is used for substitution and applies to the whole string. -
re.match()
checks if the string starts with a match, and it doesn’t search beyond the first match in the string.
Would you like more examples or a deeper dive into regular expressions?
Let’s dive deeper into re.sub()
and re.match()
with more advanced examples and explanations of regular expressions (regex) patterns.
re.sub()
Advanced Example:
Suppose we want to format phone numbers by replacing their format. We have phone numbers like 123-456-7890
and we want to replace them with a format that looks like (123) 456-7890
.
Example:
import re
text = "Contact me at 123-456-7890 or 987-654-3210."
formatted_text = re.sub(r'(\d{3})-(\d{3})-(\d{4})', r'(\1) \2-\3', text)
print(formatted_text)
Explanation:
-
\d{3}
: This matches exactly 3 digits. -
(\d{3})
: Parentheses()
are used for capturing groups. In this case, we're capturing the first three digits as one group. -
r'(\1) \2-\3'
: This is the replacement string. It uses\1
,\2
, and\3
to refer to the captured groups (the area code, first three digits, and last four digits, respectively). - So, this example finds phone numbers in the
123-456-7890
format and converts them to(123) 456-7890
.
Output:
Contact me at (123) 456-7890 or (987) 654-3210.
re.match()
Advanced Example:
Let's now look at how we can use re.match()
with more complex patterns. Assume you want to validate whether a given string is a valid email address, but we only want to check if it starts with an email format.
Example:
import re
email = "someone@example.com sent you a message."
# Basic email pattern matching the start of a string
pattern = r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+'
match = re.match(pattern, email)
if match:
print(f"Valid email found: {match.group()}")
else:
print("No valid email at the start")
Explanation:
-
^[a-zA-Z0-9_.+-]+
: This part matches one or more alphanumeric characters, dots (.
), underscores (_
), plus signs (+
), or hyphens (-
). The^
ensures the match starts at the beginning of the string. -
@[a-zA-Z0-9-]+
: This matches the@
symbol followed by one or more alphanumeric characters or hyphens (the domain name). -
\.[a-zA-Z0-9-.]+
: Matches a dot (.
) followed by alphanumeric characters, hyphens, or additional dots (the top-level domain).
This pattern will match valid email addresses at the beginning of the string.
Output:
Valid email found: someone@example.com
Explaining Common Regex Patterns:
-
\d
: Matches any digit (equivalent to[0-9]
). -
\w
: Matches any word character (alphanumeric plus underscore). Equivalent to[a-zA-Z0-9_]
. -
+
: Matches 1 or more occurrences of the preceding character or group. -
*
: Matches 0 or more occurrences of the preceding character or group. -
.
: Matches any character except newline. -
^
: Anchors the pattern to the start of the string. -
$
: Anchors the pattern to the end of the string. -
{m,n}
: Matches betweenm
andn
occurrences of the preceding character or group. -
[ ]
: Used to define a character set. For example,[a-z]
matches any lowercase letter. -
()
: Used for capturing groups, allowing us to extract parts of the match and reference them later (like inre.sub()
).
Combining re.sub()
with Functions:
You can also use a function as the replacement in re.sub()
if you want more dynamic behavior. Let’s see how.
Example: Capitalize every word in a sentence.
import re
text = "this is a test sentence."
def capitalize(match):
return match.group(0).capitalize()
new_text = re.sub(r'\b\w+\b', capitalize, text)
print(new_text)
Explanation:
-
\b
: Word boundary. -
\w+
: Matches one or more word characters. - The
capitalize()
function is called for each match, and it capitalizes the first letter of each word.
Output:
This Is A Test Sentence.
re.match()
vs re.search()
:
If you want to search for a pattern anywhere in the string (not just at the beginning), you should use re.search()
instead of re.match()
.
Example using re.search()
:
import re
text = "This is my email someone@example.com"
# Search for an email pattern anywhere in the string
pattern = r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+'
search = re.search(pattern, text)
if search:
print(f"Email found: {search.group()}")
else:
print("No email found")
Output:
Email found: someone@example.com
Here, re.search()
looks for the pattern anywhere in the string, unlike re.match()
, which only checks the start.
Summary:
-
re.sub()
: Replaces matches of a pattern within a string. Can use captured groups for dynamic replacements or even a function. -
re.match()
: Checks for a match at the beginning of a string. Useful for validation or checking the start of a string. -
re.search()
: Searches for a pattern anywhere in the string, not limited to the start.
These examples should give you a more comprehensive understanding of how regex works in Python! Would you like to explore any specific patterns or problems further?
Top comments (0)