Atta

Posted on Aug 9, 2019 • Edited on Feb 16, 2020 • Originally published at attacomsian.com

Introduction to JavaScript Regular Expressions

#webdev #javascript #codenewbie #beginners

This post was originally published on attacomsian.com/blog.

A regular expression (also called regex or regexp) is a sequence of characters that defines a search pattern. These search patterns are usually used for text search, text search and replace, data extraction, and input validation operations.

Just like other programming languages, JavaScript also supports regular expressions to perform pattern-matching and search-and-replace functions on strings. JavaScript treats regular expression as an object with predefined properties and methods.

Syntax

A regular expression can consist of one or more metacharacters and literal characters.

/pattern/modifiers;

For example, /javascript/i is a regular expression where javascript is a search pattern and i is a modifier.

Creating a Regular Expression

In JavaScript, you can create a regular expression in two ways: using a regular expression literal or calling the constructor of RegExp to initialize a new object.

var re1 = /java+script/g;
var re2 = new RegExp('java+script', 'g');

In literal form, regular expressions are compiled when the script is loaded. If the pattern remains constant, regular expression literals are better in terms of performance.

On the other hand, regular expressions created using constructor are compiled at runtime and should only be used when you are sure the pattern is dynamic.

Testing a Regular Expression

You can use test() method of RegExp object to quickly test a regular expression. This method performs a search for a match between a regular expression and a specified string. It returns true if a match is found, otherwise false.

var re = /java+script/gi;
re.test('javascript'); // true
re.test('JavaScript is a client side programming language'); // true
re.test('java'); // false

Another way to test a regular expression is using exec() method of RegExp object. It runs a search for a match in the target string and returns an array if found, otherwise null.

/^The/.exec('The Apple')
// ["The", index: 0, input: "The Apple", groups: undefined]

Regular Expression Modifiers

Modifiers (also called flags) are special characters that can be used to perform case-insensitive more advanced searches.

i performs case-insensitive matching
g performs a global match (does not stop after finding the first match and finds all possible matches)
m performs multiline matching
u enables unicode characters matching (introduced in ES6)
s (also called "dotAll") allows . to match newlines (introduced in ES9)

Flags can be combined to perform sophisticated matching operations:

var re1 = /java+script/gi;
var re2 = new RegExp('java+script', 'gi');

Regular Expression Patterns

A regular expression pattern consists of either simple characters such as /javascript/ or a combination of simple and special characters such as /java*script/. Simple characters are used for direct match. For example, the simple pattern /bcd/ matches characters combinations in strings only when exactly the characters 'bcd' appear together and in exact same order.

/bcd/g.test('Who is this bcd?') // exact match substring bcd

Special characters are used to match a broad range of values than literal strings. For example, to match a single 'a' followed by one or more 'b's followed by 'd', we can use the pattern /ab+d/. The + after 'b' means "1 or more occurrences of the previous item."

/ab+d/g.test('aabbbdccbbbd') // match substring abbbd

The following tables provide a complete list of special characters along with examples that can be used in regular expressions:

Assertions

Assertions show that a match is possible in some way. Assertions include look-ahead, look-behind, and conditional expressions.

The ? character can also be used as a quantifier.

Characters	Example	Description
`x(?=y)`	`/Atta(?=shah)/`	Look-ahead assertion. Matches `x` only if it is followed by `y`.
`x(?!y)`	`/\d+(?!\.)/`	Negative look-ahead assertion. Matches `x` only if it is NOT followed by `y`.
`(?<=y)x`	`/(?<=shah)Atta/`	Look-behind assertion. Matches `x` only if it is preceded by `y`.
`(?<!y)x`	`/(?<!-)\d+/`	Negative look-behind assertion. Matches `x` only if it is NOT preceded by `y`.

In assertions, only the x is a part of the match. For example, /Europe+(?=Germany|France)/ matches "Europe" only if it is followed by "Germany" or "France". However, neither "Germany" not "France" is part of the match results.

/Europe+(?=Germany|France)/.test('EuropeGermany') // matches "EuropeGermany"
/(?<!-)\d+/.test('45.99') // matches "45"

Boundaries

Boundaries indicate the starts and ends of lines and words.

Characters	Example	Description
`^`	`/^An/`	Matches the start of input
`$`	`/App$/`	Matches the end of input
`\b`	`/un\b/`	Matches a word boundary
`\B`	`/\Bon/`	Matches a non-word boundary

/^An/.test('An Apple') // matches "An"
/App$/.test('Mobile App') // matches "App" 
/un\b/.test('Sun') // matches "un"
/\Bon/.test('Moon') // matches "on"

Groups and Ranges

Groups and ranges are useful to find a range of characters.

(x|y) matches either x or y. For example, /(green|red)/ matches "red" in "red bull".
[abcd] matches any one of the enclosed characters. Equivalent to [a-d].
[^abcd] matches none of the enclosed characters. Equivalent to [^a-d].
[0-9] matches any one of the enclosed characters (any digit).
[^0-9] matches none of the enclosed characters (any non-digit).

/[a-z]/.exec('a2') // matches "a"
/[0-9]/.exec('a2') // matches "2"
/[a-z0-9]/.exec('$a2') // matches "a"

Character Classes

Character Classes (also called Metacharacters) are characters with special meaning to distinguish kinds of characters. For example, differentiating between letters and digits.

Characters	Example	Description
`.`	`/.s/`	Matches any single character, except newline or line terminators
`\w`	`/\w/`	Matches any alphanumeric character including underscore. Equivalent to `[A-Za-z0-9_]`.
`\W`	`/\W/`	Matches any non-alphanumeric character. Equivalent to `[^A-Za-z0-9_]`.
`\d`	`/\d/`	Matches any digit. Equivalent to `[0-9]`.
`\D`	`/\D/`	Matches any character that is not a digit. Equivalent to `[^0-9]`.
`\s`	`/\s/`	Matches a single whitespace character
`\S`	`/\S/`	Matches a single character other than whitespace
`\t`	`/\t/`	Matches a tab character
`\n`	`/\n/`	Matches a newline character
`\0`	`/\0/`	Matches a NUL character
`\uxxxx`	`/\uxxxx/`	Matches a unicode character

/.s/.test('yes') // matches "es"
/\d/.test('3D') // matches "3"
/\w/.test('$9.99') // matches "9"
/\W/.test('45%') // matches "%"

Quantifiers

Quantifiers define quantities and indicate numbers of characters or expressions to match.

Characters	Example	Description
`n+`	`/ab+/`	Matches any string that contains at least one `n`
`n*`	`/ab*/`	Matches any string that contains zero or more occurrences of `n`
`n?`	`/ab?/`	Matches any string that contains zero or one `n`
`n{x}`	`/a{2}/`	Matches exactly `x` occurrences of the preceding item `n`
`n{x,}`	`/a{2, }/`	Matches at least `x` occurrences of the preceding item `n`
`n{x,y}`	`/a{1,3}/`	Matches at least `x` and at most `y` occurrences of the preceding item `n`

/ab+/.test('abbcdab') // matches "abb"
/bb*/.test('abbcdab') // matches "bb"
/b{2,}/.test('abbcdab') // matches "bb"
/a{1,3}/.test('bbcdaaab') // matches "aaa"

Regular Expression Escaping

If you want to use any of the special characters as literals (like searching for a '+'), you need to escape them by putting a backslash in front of them. For instance, to search for 'a' followed by '+' followed by 'c', you'd use /a\+b/. The backslash "escapes" the '+', making it literal instead of special.

/\d\+\d/.test('2+2') // true
/\$\d/.test('$2.49') // true

Regular Expression Usage

In JavaScript, regular expressions are used with RegExp's object method exec() and test(), and with the match(), replace(), search(), and split() methods of String.

var str = "JavaScript is a client-side programming language";
str.search(/client/i)
//16 (the starting position of "client")
str.replace(/client/i, 'server')
//JavaScript is a server-side programming language
str.match(/client/i)
//["client", index: 16]
str.split(/\s/i)
//["JavaScript", "is", "a", "client-side", "programming", "language"]

Above are only a few examples of using regular expressions for search and replace operations. They are also used for input validations and data extractions in JavaScript:

// extract number from a string
'I paid $45.99'.match(/\d+\.*\d*/) //["45.99", index: 8]

// validate date in dd-mm-yyyy format
/(\d{2})-(\d{2})-(\d{2,4})/.test('23-12-89') // true

✌️ I write about modern JavaScript, Node.js, Spring Boot, and all things web development. Subscribe to my newsletter to get web development tutorials & protips every week.

Like this article? Follow @attacomsian on Twitter. You can also follow me on LinkedIn and DEV.

Top comments (5)

Stephen Cooper • Aug 10 '19

Thanks for all the examples! Only just realised you can use regexes in query selectors too! Very helpful when working with a tool like Puppeteer.

Atta • Aug 10 '19

Thanks for the tip ✌️ I didn't know about it.

loup20 • Aug 9 '19

love it.

Joe Attardi • Aug 9 '19

Unfortunately lookbehind is not fully supported in all browsers/JS engines yet. This bit me a while back when I was working on a problem that would have been a simple fix with lookbehind 😕

Joe Attardi • Aug 9 '19

They are coming, though! Lookbehind is currently a stage 4 proposal.

DEV Community