DEV Community

Cover image for Everything about RegEx and How to use it in JavaScript?
Rahul
Rahul

Posted on

Everything about RegEx and How to use it in JavaScript?

Regular expressions play a vital role in every high-level programming language and so in JavaScript. Let's know them all in detail...


A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. It helps you to "match" part of the text (string) by given rule.

// Let's get our hands dirty with an examples: 

const regex = /[A-Z]\w+/g; // (We ommited ' ')
// regex holds a regular expression which will match words starting with a capital letter. 

const str = `Rahul, Taylor and Susanne are coders who live in India`; 

// When we apply the regex to str, it returns all matches in a simple array! 
// console.log( str.match(regex) )
//["Rahul", "Taylor", "Susanne", "India"]

Enter fullscreen mode Exit fullscreen mode

You can do the same operation by using plain JavaScript, but regex can save you hundreds of lines & you can use it with almost any language (and even CLI tools)

The Core and Some Basics

When you write a RegEx, it always starts with / and ends with /. Your write the code in between the two slashes. The simplest example - to match the word 'apple' use /apple/ RegEx. This, however, won't match 'APPLE' or 'aPpLe', because RegEx is case sensitive.

To disable case sensitivity in RegEX, use what is called an i flag /apple/i now it will match 'apple', 'APPLE' & 'aPpLe'. To match both 'apple' and 'nut' use apple|nut/ RegEx. Simple, ugh?

How to use in JavaScript

Let's learn the most basic methods in JS for working with RegEx'es

  • str.match(regex) : Returns an array with all the matches it has found. Actually. there's a little catch here😉. If you try doing this: "apple apple" .match(/apple/) you would expect to get ['apple', 'apple'] nut that's not the case. In reality it returns just ['apple']. To get a full array with multiple matches, you should add g flag.

  • str.test(str) : regex is a variable assigned to your RegEx. str is the string you test with the RegEx. The method returns true if it finds any matches or false.

  // Let's hang with them
  let regex = /code|easy/i; 
  const str = 'this code is EaSy super easy'; 
  regex.test(str) // true; means we have a match😍

  str.match(regex) // ["code", index: 5, input..]

  // Oops! We forgot adding the g flag
  regex = /code|easy/ig;

  str.match(regex) // ["code", "EaSy", "easy"]
  // ALRIGHT!!
Enter fullscreen mode Exit fullscreen mode

Concept of Wildcard Period

We learned how to statically match a word, let's say 'hug' (/hug/). But what if we want to match 'huh', 'hug', 'hum' at the same time? Wildcard period! That's the answer. /hu./ This will match all 3 letters long words starting with 'hu'.

Match single character with multiple possibilities

A lot of times you want something in-between. Instead of targeting every character by using . you might want to target only a, b, c, d, e characters. That's when the nest 'tricks' come in handy.

// CHARACTER CLASSES allow you to define a group of characters you wish to match. You put the chards in [] "car cat cam cal car".match(/ca[rt]/g); 
// returns: ['car', 'cat', 'car']

// match "bag", "big", "bug", but nit "bog" "big bag has a bug bog".match(/b[aiu]g/g); 
// ["big", "bag", "bug"]

// MAKE CHARACTER CLASSES SHORTER by using [X-Y] which will match fro letter x to letter y. Example: [a-zA-Z] will match all capital and not capital letters from a to z

"abcdefghijklmnopqr".match(/[d-j]/g); 
// ["d", "e", "f", "g", "h", "i", "j"]

//same as: 
"abcdefghijklmnopqr".match(/defghij/g); 
// ["d", "e", "f", "g", "h", "i", "j"]

// Use it with a number too: 
"1234567890".match(/4-9/g); 
//["4", "5", "6", "7, "8", "9"]
Enter fullscreen mode Exit fullscreen mode

Reverse the character classes

a-z will match all letters from a to z. To match all symbols, EXCEPT the letters from a to z, use [^a-z]. The ^ operator reverses the behaviours when used in[ ].

Matching characters that occur more than one times

// With +
let regex = /a+/g; 
"abc".match(regex) //["a"]
"aabc".match(regex) //["aa"]
"aabac".match(regex) //["aa", "a"]
"bbc".match(regex) //null

//without +
regex = /a/g; 
"abc".match(regex) //["a"]
"aabc".match(regex) //["aa"]
"aabac".match(regex) //["aa", "a"]
"bbc".match(regex) //null

Enter fullscreen mode Exit fullscreen mode

Search for patterns from the beginning of the end of the string

To search a character exactly at the beginning of a string using ^

let regex = /^K/; 

regex.test("__K_K_") // false - K is not exactly at the beginning!
regex.test("K___K___") // true 

//To search for a character at the end of string use $ like so

regex = /K$/; 

regex.test("__K__K_") // false - K has to be at the end

regex.test("__K") // true
Enter fullscreen mode Exit fullscreen mode

Optional character

let regex = /colou?r/; // makes 'u' capital

let american = "color"; 
let british = "colour"; 

regex.test(american); // true
regex.test(british); // true
regex.test("cologr"); // false
Enter fullscreen mode Exit fullscreen mode

Let's take this to advance level

Common shorthands

  • Instead of [A-Za=z0-9]

Use -> \w

  • Instead of [^A-Za-z0-9]

Use -> \W

  • Instead of [0-9]

\d

  • Instead of ^ 0-9

Use -> \D

Specify the upper and lower limit of matches

What if you want to match a sequence of characters that repeats X times, for example - match exactly a sequence of 5 letters 'a'? Here we go a{5} This would match only 'aaaaa' but not 'aa' or 'aaaaaaa'.

Let's see...

let str = "ama baalo maaaaamal aaaaaa"; 
console.log( str.match(/a{5}/g ) ); 
//prints ["aaaaa". "aaaaa"]

//to match 'm' letter followed by 5 x 'a'
console.log( str.match( /ma{5}/ ) ); 
// prints ["maaaaa", indes: 10, ...]
//which means we have a match at index 10

// to match empty space followed by 4 x 'a'
console.log( str.match(/\sa{4}/ ) ); 
// prints [" aaaa", index: 19, ...]
// match at index 19
Enter fullscreen mode Exit fullscreen mode

You saw how to match an exact number of repeating characters a{5} matches "aaaaa". But what if you want to match not exactly 5, but in a more flexible manner - from 1 to 3 repeating characters? Here we go a{1,3} which will match "a" , "aa", "aaa", but not "aaaa".

We can go even further - by omitting the first or the second parameter a{3} will not match "a", "aa", but will match "aaa", "aaaa" or higher.

## Match characters t#hat occur multiple times

Above we have briefly covered this topic, now is the moment to go deep.

  • To match one or more characters, use after the target character.
let str = "ama balo maaaaamal"; 
console.log( str.match( /a+/g ) ); 
// ["a", "a", "aa", "aaaaa", "a"]

console.log( str.match( /a/g ) ); 
// ["a", "a", "a", "a", "a", "a", "a", "a", "a", "a"]
Enter fullscreen mode Exit fullscreen mode
  • To match zero or more characters, ue after the target character
let str = "aaa"; 
console.log( str.match( /a*/g ) ); 
// ["aaa", ""]

consolle.log( str.match( /a/g ) ); 
// ["a", "a", "a"]
Enter fullscreen mode Exit fullscreen mode
  • To match zero or one character, use after the target character
let str = "aaa"; 
console.log( str.match( /a?/g ) ); 
// ["a", "a", "a", ""]
Enter fullscreen mode Exit fullscreen mode

Positive and Negative lookahead

This is considered one of the abstract topics in regex, but I will try to cover 80/100 of what you need to know.

  • a(?=g) - Positive lookahead Matches all "a" that is followed by "g", without making the "g" part of the match.
  • a(?!g) - Negative lookahead Matches all "a" that are NOT followed by "g", without making "g" part of the match.

But it can be even more flexible. See this example -> (?=regex) ?!regex

On the place of regex, you can put any valid regex expression. Let's hang with this...

let str = "IsFunBaloonIsLearningRegExIsLean"; 

console.log (str.match( /Is(?=Learning)/ ) ); 
//["Is", index: 11, ...]
//Matches the 2nd "Is", right before "Learning"

console.log( str.match( /Is(?=Lean)/ ) ); 
//["Is", index: 26, ...]
//Match the 3rd "Is", right before "Lean"

console.log( str.match( /Is(?=L)/g ) ); 
// ["Is", "Is"]
//Matches all "Is" which are followed by "L"

console.log( str.match(/Is(?!L)/ ) ); 
//["Is", index:0, ...]
// Matches all "Is" which aren't followed by "L"
Enter fullscreen mode Exit fullscreen mode

What if you want the opposite - check the character before, not after the target character? You use a LookBehind ;P

Reusing patterns with capture groups

We all know the DRY programming principle - Don't Repeat Yourself. Capture groups help us to do exactly this.

/(bam+)\w\1/g  same as 
/(bamm+)\w(bamm+)/g same as
/bamm+\wbamm+/g
Enter fullscreen mode Exit fullscreen mode
/(\w+)\s(\1\1\1)\2/g same as
/(\w+)\s\1\1\1\1\1\1/g

Enter fullscreen mode Exit fullscreen mode
/(\w+)\s\1\1\1/g  same as
/\w+\s\w+\w+\w+/g
Enter fullscreen mode Exit fullscreen mode

Now let's learn how to unleash this potential regex power and fuel it all to your JavaScript skills!

Creating RegEx in JavaScript

let regex = /a[0-9]b+/

//if you want to pass flags (like i and g)
let regex = /a[0-9]b+/ig
Enter fullscreen mode Exit fullscreen mode

-> Compiles when script is loaded

  • Using the RegEx constructor function
  let regex - new RegExp('a[0-9]b+')

  //if you want to pass flags (like i and g)
  let regex = new RegExp('a[0-9]b+', 'ig')
Enter fullscreen mode Exit fullscreen mode

-> Compiled on runtime


FLAGS

In JavaScript we have 6 flags which affect the match:

  • i - Makes the match case-insensitive. No difference between 'C' and 'c'
  • g - Without this flag, only the first match will be returned
  • m - Multiline more; only affects the behavior of ^ and $
  • s - Dotall mode; allows wildcard period . to match newline character \n
  • u - Enabled full Unicode support
  • y - Sticky mode. Enabled searching at a specific position

LET'S SEE JS METHODS THAT USE RegEx IN SOME FORM OR ANOTHER

  • str.match(regexp) - Finds all matches of regexp in the string str and returns an array of those matches
  • regexp.exec(str) - Similar to the match method but it's meant to be used in a loop when the regexp is stored in global variable but not passed directly
// Difference between the two methods

let re = /bla/g; 
let str = "bla and yea bla yeh"; 

re.exec(str)
// -> ["bla", index: 0, ...]
re.exec(str)
// -> ["bla", index: 13, ...]
re.exec(str)
// -> null
re.exec(str)
// -> ["bla", index: 0, ...]                
// STARTS AGAIN

//USAGE WITH A LOOP
let match, str = "bla and yeah bla yeh ble"; 
while (mathc = re.exec(str)) {
    console.log(match); 
}
// ["bla", index: 0, input: ...]
// ["bla", index: 13, input: ...]

// on the other side, match works pretty simple
str.match(re)
// ["bla", "bla"]
Enter fullscreen mode Exit fullscreen mode
  • str.matchAll(regexp) - A new JS feature and improvement on the match method. 3 Differences:
    • Returns an iterable object with matches instead of an array.
    • Each match is in the same format as str.match without the 'g' flag.
    • If there are no matches it returns empty iterable object rather than null if you used to match.

Always add g flag when using this one!

let regexp = /bla/g; 
let str = 'bla and yeah bla yeh'; 
const matches = str.matchAll(regexp); 
for (let match of matches) {
    console.log(match)
}
// ["bla", index: 0, ...]
// ["bla", index: 13, ...]
Enter fullscreen mode Exit fullscreen mode
  • regexp.test(str) - Looks for at least one match of regexp in str. If found, returns true. Otherwise false.

  • str.search(regexp) - Returns the index of the first available match. If no match is found returns -1.

  • str.match(separator) - Instead of passing a simple string to separator like ' ', we can also pass regex for more precise split/

  • str.replace(from, to) - from is what to match. It can be a string or regex. The first match will be replaced with the string you have passed to the to argument. Instead of a string, you can pass a function too, but this is outside of the scope of this tutorial.

  • str.repalceAll(from,to) - Same as replace, except instead of replacing only the first match it will replace all matches with the provided to. Example:

  let str = "stuffed str living fforever pff"
  let regex = /f+/; //match one or more 'f'

  let repl = str.replace(regex, '*'); 
  //repl is "stu*ed str living fforeverpff"
  let repl = str.replaceAll(regex, '*'); 
  // repl is "stu*ed str living *orever p*"
  // NOTE: If you add g flag to replace it works like replaceAll
Enter fullscreen mode Exit fullscreen mode

A bit tough and lengthy. Hope you liked it! Use the comments for sharing your views and questions.

🔐Thanks For Reading | Happy Coding 📘

Get weekly newsletter of amazing articles I posted this week and some offers or announcement. Subscribe from Here

Top comments (1)

Collapse
 
andrewbridge profile image
Andrew Bridge

Really thorough run through of regex, such a useful feature once you know it!

Just as a heads up, your examples in your character groups section are a bit confusing. I think you've maybe missed the square brackets in the final two examples:

// Without character group
"abcdefghijklmnopqr".match(/defghij/g); 
// ["defghij"]

// With character group
"abcdefghijklmnopqr".match(/[defghij]/g); 
// ["d", "e", "f", "g", "h", "i", "j"]
Enter fullscreen mode Exit fullscreen mode
// Without character group
"1234567890".match(/4-9/g); 
// null

// With character group
"1234567890".match(/[4-9]/g); 
// ["4", "5", "6", "7", "8", "9"]
Enter fullscreen mode Exit fullscreen mode

With that said, the error shows how seemingly subtle changes can make a huge difference with regex, which is often where it becomes a bit tricky!

When I'm dealing with regexes that need to go into production code, I'll often set them up in a tool like Regex 101 and then use various different input strings (with multilines, odd characters etc) to check it's battle tested. I'll often then be able to put these into a unit test so the regex continues to work as expected in the long term.

Great tutorial once again!