DEV Community

Katherine Kelly
Katherine Kelly

Posted on

Rejects with Regex

When I first learned regular expressions and saw “regex” or “regexp” as its shortened name, I pronounced it like the noun “rejects”, something rejected as not wanted or not fulfilling requirements. I was saying it incorrectly for more time than I'd like to admit before I was politely corrected on the pronunciation, or at least a closer variation of it (I pronounce the ‘reg’ part like ‘redge’, don’t @ me).

But I like the sound of “rejects” because regular expressions do just that. Regular expressions are patterns used to match character combinations (so alternatively, reject those characters that do not fulfill the requirements).

rugrats GIF

Getting Started

In JavaScript, regular expressions are objects used to match text with a pattern. The RegExp class represents regular expressions and are used in conjunction with String and RegExp methods.

There are two ways create a RegExp object:

// calling constructor function
const regex = new RegExp('abc');

// using literal notation with character pattern between slashes 
const regex = /abc/;

If your regular expression will remain constant, using the literal notation way is preferred and can improve performance, as the compilation of the regular expression happens only when the expression is evaluated.

The constructor method provides runtime compilation of the regular expression so it is best used when the pattern will be changing or is dynamic based on user input.

How to Write a Regex Pattern

A regex pattern comprises of simple characters, like /abc/, or a combination of simple and special characters, such as /ab*c/.

If the pattern consists of just simple characters like /abc/, it would match character combinations in strings only when there is an exact sequence of ‘abc’.

Special Characters

For things beyond a direct match, special characters can help make the pattern more robust. For a pattern like /ca*t/, it's looking for a single 'c' followed by zero or more 'a's followed by 't' in the string. The * character means zero or more of the preceding character. The caaaat is back! would be a match with the /ca*t/ pattern.

Other common special characters include:
Assertions
^ : start of line
$ : end of line

Quantifiers
* : zero or more of the preceding character
? : zero or one of the preceding character
+ : one or more of the preceding character

Character Classes
. : any single character
\s : any whitespace character
\d : any digit
\w : any word character (letter, number, underscore)

Groups and Ranges

Groups /[abc]/ and ranges /[a-c]/ are another kind of special characters in regular expressions used to find a set of characters enclosed in square brackets, in which text can be matched with any one of the enclosed characters. If you're looking to see if a vowel exists in a string, the character set would look like /[aeiou]/. You can also specify a range of characters with a hyphen /[a-d]/, however, if the hyphen is the first or last character within the brackets [-abcd] then it will be interpreted as a literal hyphen that can be matched.

A ^ within the character set means a negated character set so it will match anything not enclosed in the brackets. /[^aeiou]/ will match the first character that is not included, so aeiopup will match p.

const myReg = /[^aeiou]/;
const myStr = 'aeiopup';
myStr.match(myReg); // ['p', index: 4, input: 'aeiopup', groups: undefined];

Escaping to Literally Use Those Special Characters

If you find yourself needing to actually search for a special character in its non-special character capacity (just a plain old .), you will have to escape it with a backslash in front of it /\./.

Another way to literally search for just a '.' would be to put the character within square brackets as described above (/[.]/).

Flags

For additional functionality, there are six optional flags to include in a regular expression. Flags can be used separately or together in any order.

These are:
g : global search (search entire string)
i : case-insensitive search
m : multi-line search
s : allows . to match newline characters (ES2018)
u : 'unicode'; treat pattern as a sequence of unicode code points
y : perform a 'sticky' search that matches starting at the current position in the target string

//syntax
const regex = /pattern/flags;

// match any characters not in the character set, 
// case insensitive, entire string
const myReg = /[^aeiou]/ig;
//or
const myReg = new RegExp('[^aeiou]', 'ig');

const myStr = 'aeiopuPs';
myStr.match(myReg); // ['p', 'P', 's'];

String and RegExp Methods

The methods that use regular expressions are:

RegExp methods: test() and exec()
String methods: match(), replace(), search(), and split()

RegExp Methods

The test() method returns a boolean after testing for a match in the string parameter.

// syntax 
regexObj.test(string);

const str = 'Noodles are my favorite foods';
const regex = /noodles/i; 
regex.test(str); // true

The exec() method executes a search for a match in the string parameter. It will return a results array if found or null on a mismatch. If regular expressions have a global (g) or sticky (y) flag, they will be stateful and can store a lastIndex from the previous match. Exec() can be used to iterate over multiple matches, unlike String.prototype.matches() that will just get the matching strings.

MDN Docs emphasize to not place the regular expression within the while condition or else you'll enter into an infinite loop if there is a match since lastIndex will be reset at each iteration. And also be sure to use the global flag or else it will also cause an infinite loop.

// syntax
regexObj.exec(string);

const str = 'I really enjoy eating noodles and more noodles';
const regex = new RegExp('noodle', 'g');
let arr;

while ((arr = regex.exec(str)) !== null) {
  console.log(`Found ${arr[0]}! Next iteration starts at index ${regex.lastIndex}.`);
}
// Found noodle! Next iteration starts at index 28.
// Found noodle! Next iteration starts at index 45.

String Methods

The match() method returns an array containing all of the matches or null if no match is found.

// syntax
string.match(regexp);

const str = 'I really enjoy eating noodles and more noodles';
const regex = new RegExp('noodle', 'g');
str.match(regex); // ['noodle', 'noodle']

Then there's matchAll() which returns an iterator of all results matching a string against a regular expression, including capturing groups. With matchAll() you can avoid using exec() and having to use a while loop to get the matches in favor of using more convenient iterators like for...of, the spread operator, or Array.from(). You will have to use the global flag or else you'll get an error.

const str = 'I really enjoy eating noodles and more noodles';
const regex = new RegExp('noodle', 'g');
const matches = str.matchAll(regex); 

for (const match of matches) {
  console.log(`Found ${match[0]}! Start = ${match.index}     
  End = ${match.index + match[0].length}.`);
}
// Found noodle! Start = 22 End = 28
// Found noodle! Start = 39 End = 45

The search() method tests for a match in the string. If successful it will return the index of the first match, or -1 if no match is found.

// syntax 
str.search(regex);

const str = 'Pizza in the Morning, Pizza in the Evening...'
const regex1 = /[a-z]/g; 
const regex2 = /[!]/g;
str.search(regex1); // 1
str.search(regex2); // -1

replace() executes a search for a match in the string, and replaces the matched substring with the replacement substring. A new string is returned so it does not mutate. You can use either a regular expression or a string to set the pattern, however, if the pattern is a string only the first occurrence will be replaced.

// syntax
str.replace(regex|substr, newSubstr)

const str = "when pizza's on a bagel you can have pizza anytime.."
const regex = /bagel/gi;
str.replace(regex, 'noodle'); // "when noodle's on a bagel you can have noodle anytime.."

split() uses a string or regular expression to break or split up a string into an array of substrings that are separated at each instance of the specified separator.

When using a regular expression that contains capturing parenthesis, the matched result(s) is included in the array. If not included, it will omit the match(es).

// syntax
str.split(separator, [limit]) // limit is optional - 
// limits the number of pieces the string is split. 
// Any leftover text is not included in the array at all.

const str = "Eat 5 servings of fruits and vegetables a day"

// with capturing parenthesis around \d will include any 
// matched digits in returned array
const splitStr = str.split(/(\d)/); // (3) ["Eat", "5", "servings of fruit 
// and vegetables a day"]

// without capturing parenthesis will omit any matched digits in returned array
const splitStr = str.split(/\d/); // (2) ["Eat", "servings of fruit 
// and vegetables a day"]

Testing Your Regular Expressions

Of course there are a lot more special characters and categories than I could list above. Some resources I've found to be helpful when building and testing regular expressions are Rubular (Ruby) and RegExr (JavaScript and PHP), although most programming languages will have similar syntax.

For a quick reference on a specific pattern or construct, the MDN Docs Cheatsheet is a handy guide.

Resources
RegExp
Regular expressions

Discussion (0)