DEV Community

Cover image for A Comprehensive Guide to Regular Expressions
IanMcbull
IanMcbull

Posted on • Updated on

A Comprehensive Guide to Regular Expressions

When I first saw regular expressions, I thought they looked like something nonsensical that a toddler had scribbled down on a piece of paper. It looked like gibberish, gobbledygook, a bunch of symbols and characters that someone had slapped together. Take a look at this:


/^\w+@\w+\.\w+(\.\w+)?$/

Enter fullscreen mode Exit fullscreen mode

This is a regular expression for validating email addresses. Told you, gibberish looking. Don't worry if you don't understand what this means right now, we'll go through a series of examples and discuss how to write regular expressions. Before we dive into how to write them, let's define them first.

What are Regular Expressions

I think of regular expressions(also known as regex) as a test for matching string patterns. A test because we are checking whether a given text input, contains a particular pattern. Regex comes in six different dialects and regardless of what programming language you use, you're implementing one of the following dialects:

Flavor Description
BRE Basic Regular Expressions
ERE Extended Regular Expressions
EMACS EMACS Implementation
VIM VIMs Implementation
PEARL Perl Six, Perl Seven, Perl Eight
PCRE Perl Compatible Regular Expressions

💡 "The examples I'll be giving will be in JavaScript, which uses a PCRE derivative syntax"

The regex engine is the thing that's responsible for performing the matching that happens every time you perform a regular expression operation. You don't need to know the nitty-gritties of how it actually works under the hood, but it's important to mention it. If you're interested in reading more about it, head on over to Devopedia

So now that we know that a regex is a text pattern, how do we actually write them?

Writing Regular Expressions.

You can create regular expressions in one of two ways in JavaScript:

Using the built-in RegExp Constructor


let regex = new RegExp("Bruce");

Enter fullscreen mode Exit fullscreen mode

or

Using the literal notation


let regex = /Wayne/;

Enter fullscreen mode Exit fullscreen mode

💡 The literal syntax is preferred when the regex is known at development time, and the constructor approach is used when the regex is constructed at runtime by building it up dynamically in a string.
-Secrets of the JavaScript Ninja, Second Edition

We'll be using the literal notation from here on out.

We defined regular expressions as patterns, so where is the pattern. From our example above, "Wayne" is our pattern. We have a "W" followed by the letter "a" followed by "y" and so forth. This is a pattern because only by combining those letters, will you get the word "Wayne".

So now that we have a pattern, how do we actually test it against something else like another word or a sentence?

Testing, Matching, and Extracting patterns.


let regexPattern = /Wayne/;
let textInput = "It's not who I am underneath, but what I do that defines me. - Bruce Wayne";

Enter fullscreen mode Exit fullscreen mode

We want to check whether our regex pattern exists in the text input. JavaScript provides a number of methods that we can use.

Method Description
test Returns true if there was a match, or false if no match was found
match Returns an array if there was a match or null if no match was found
exec Returns an array if there was a match or null if no match was found
search Returns the index position of the match or -1 if no match was found
matchAll Returns an iterator of all results matching or -1 if no match was found
replace Returns a new string with some or all matches replaced by a replacement string
replaceAll Returns a new string with all matches of a pattern replaced by a replacement string

We'll have a look at the first three methods and see how they work. If you're interested in learning more about how the methods work, you can head on over to MDN and do a deep dive into them.

Let's first try to match our pattern using the test method.


let regexPattern = /Wayne/;
let textInput = "It's not who I am underneath, but what I do that defines me. - Bruce Wayne";
console.log(regexPattern.test(textInput)) // Returns true

Enter fullscreen mode Exit fullscreen mode

The test method is called on the regex itself and it takes in a single argument, the text input. It's particularly useful when you simply want to see whether the pattern exists or not and are not concerned with extracting anything after matching.

The match method gives you a bit more information regarding the match.


let regexPattern = /Wayne/;
let textInput = "It's not who I am underneath, but what I do that defines me. - Bruce Wayne";
console.log(textInput.match(regexPattern))

Enter fullscreen mode Exit fullscreen mode

[
'Wayne',
index: 69,
input: "It's not who I am underneath, but what I do that defines me. - Bruce Wayne",
groups: undefined
]

Enter fullscreen mode Exit fullscreen mode

As you can see, we get an array that has information pertaining to the match. We get the pattern that matched, we also get the index position of the match as well as the input text that was passed in. We'll talk about groups in the coming section. The method is called on the text input and not on the regex itself.

The exec method also returns an array:


let regexPattern = /Wayne/;
let textInput = "It's not who I am underneath, but what I do that defines me. - Bruce Wayne";
console.log(regexPattern.exec(textInput)) // Returns an array"

Enter fullscreen mode Exit fullscreen mode

[
'Wayne',
index: 69,
input: "It's not who I am underneath, but what I do that defines me. - Bruce Wayne",
groups: undefined
]

Enter fullscreen mode Exit fullscreen mode

As you can see, the output is identical to that of the match method, however, the exec method is called on the regex itself and not on the text input as we saw in the match method.

Now that we've seen how to test and extract matches, we can now begin to dissect the actual matching and try to see how the regex engine works at a lower level. This will hopefully have you thinking about your regex patterns differently even as you run your operations.

How does the matching occur.

We talked about the regex engine a bit at the beginning of the post, now let's see how it runs every time we try to match a pattern.
The regex engine creates a graph that contains all the characters that need to match. In our case, we are trying to match the pattern "Wayne". The input text is the input we are trying to match against in this case, "It's not who I am underneath, but what I do that defines me. - Bruce Wayne"

Regex engine graph

As you can see, our pattern has been split up into individual letters, that the regex engine will use to determine whether we match or not. The regex engine will always try to match the leftmost match first. So in this case, it will try to match "W" before moving along the graph to the next letter.

The regex engine will go through each word, trying to match the "W" and won't move on to the next letter until it has tried to match all possible paths. If it fails, then the regex engine will fail to match and stop trying.

So in our case, we will encounter the "W" at the very end of our string. Once we match it, the regex engine will move on to the next letter in the pattern until it matches the entire pattern.

Here are a few key takeaways regarding the regex engine:

  • The engine tries every possible execution path before it moves along the string
  • The regex engine will find the earliest possible match and not the best possible match.
  • The regex engine will always try the leftmost match first
  • The regex engine always matches as early as possible in the string on the leftmost possible alternative in the regex.
  • The regex engine also returns as soon as it finds a match.

💡 Write regular expressions that match. This will improve the performance of the regex as it won't have to traverse through all the possible matches.

Up to this point, we've been matching basic patterns that don't have things like alternative matching, matching at specific positions and so fourth. That's what we are going to be looking a next.

Flags

Flags are used to add flexibility in terms of what is considered a match. For example the i flag is used to make the regex engine match regardless of case. So "Wayne" and "WAYNE" would both match.

The table below describes the flags that are available to us:

Flag Description
i ignores case
g Perform a global search and don't return only after the first match
m When the multi-line flag is enabled, beginning and end anchors (^ and $) will match the start and end of a line, instead of the start and end of the whole string.
y The expression will only match from its lastIndex position and ignores the global (g) flag if set
u When the unicode flag is enabled, you can use extended unicode escapes in the form \x{FFFFF}.
s Dot (.) will match any character, including newline.

We'll have a look at the two most common, flags the i flag and the g flag, but if you're interested in learning more about flags you can visit the MDN documentation.

The i flag is used to match a pattern regardless of the case.


let regexPattern = /Wayne/i; // You add the flag after the closing forward slash.
let textInput = "It's not who I am underneath, but what I do that defines me. - Bruce WAYNE";
console.log(regexPattern.test(textInput)) // Returns true

Enter fullscreen mode Exit fullscreen mode

The g flag tells the regex engine, keep looking for more matches even after you've encountered the first match.


let regexPattern = /Wayne/ig; // You can combine flags to add more flexibility to your regex.

// ['Wayne','Wayne']

Enter fullscreen mode Exit fullscreen mode

Match any character

Sometimes we want to match any character in our pattern from digits to numbers to symbols to white-spaces.

Writing this all out explicitly will be too verbose and clunky, so we use the period symbol which matches any character.

let matchAnything = /./
Enter fullscreen mode Exit fullscreen mode

Grouping

Grouping is used to group patterns together to make it easier to not only read our patterns but to also do more intuitive things as we match our patterns. We use parenthesis to group our patterns.

 let groupedPattern /(nickle)(back)/
Enter fullscreen mode Exit fullscreen mode

Match patterns at specific positions

So far we've been matching patterns at any position but what if we want to match at the start or at the end of a string?

Symbol Description
^ Match a pattern at the start
$ Match a pattern at the end
  let matchStart = /^a/ // Match an "a" at the beginning. 
  let matchEnd = /a$/ // Match an "a" at the end.
Enter fullscreen mode Exit fullscreen mode

The ^ can also be used to negate characters. To negate a character, we use [^character] syntax. For example if we wanted out pattern to not include an "a" we would write:

let notIncludingA = /[^a]/
Enter fullscreen mode Exit fullscreen mode

Quantifiers

Quantifiers tell the regex engine how many times to match a character or pattern.

Quantifier Description
* Match a pattern zero or more times
+ Match a pattern one or more times
? Match a pattern zero or once
{x} Match a pattern an arbitrary number of times
let matchZeroOrMore = /a*/
let matchOneOrMore = /a+/
let matchZeroOrOnce = /a?/
let matchArbitraryTimes = /a{3}/ // Match three times

Enter fullscreen mode Exit fullscreen mode

The quantifier is placed after the character you are trying to quantify. the ? is used to match an optional character as in the example below:

let pattern = /favou?rite/
// This will match favourite or favorite
Enter fullscreen mode Exit fullscreen mode

You can also use the | symbol to match one or more options. So we can rewrite the above patterns as /favourite|favorite/and this would still work.

Sometimes you may want to match a set of characters in a single position. To do this, we use the character set syntax which is denoted using [].

let pattern = /gr[ae]y/
Enter fullscreen mode Exit fullscreen mode

This will now match either an "a" or an "e" in that position. We can include a range of numbers or letters to add more flexibility to the pattern.

  let pattern = /gr[a-f]y/
  console.log(pattern.test("His grey hair is starting to show")); // True
Enter fullscreen mode Exit fullscreen mode

Shorthand syntax

The shorthand syntax is used to match a variety of characters without having to explicitly define the characters themselves.

For example lets assume we want to match all letters and numbers in a single position. If you were to write this explicitly, you'd write [a-z0-9].

This feels and looks a bit verbose, so instead, we can use the shorthand symbol \w.

Symbol Description
\w Match alphanumeric characters[a-zA-Z0-0_].
\d Match digits [0-9]
\s Match white-spaces [\t\r\n\f]
\W Match anything that is not an alphanumeric character [^0-9A-Za-z_]
\D Match anything that is not a digit [^0-9]
\S Match anything that is not a white space
let alphaNumericsMatch = /\w/
let digitsMatch = /\d/
let whiteSpaceMatch = /\s/
let nonAlphaNumerics = /\W/
let nonDigits = /\D/
let nonWhiteSpace = /\S/
Enter fullscreen mode Exit fullscreen mode

We've covered a lot of material. From defining regular expressions, to writing them, to using them to match and extract patterns.

This will hopefully serve as a guide as you navigate your way through regular expressions.

Resources

Regex101
RegexOne

Thanks for reading and happy coding!! 💥

Top comments (0)