DEV Community

John Au-Yeung
John Au-Yeung

Posted on

Manipulating Strings with Regular Expressions

Subscribe to my email list now at http://jauyeung.net/subscribe/

Follow me on Twitter at https://twitter.com/AuMayeung

Many more articles at https://medium.com/@hohanga

Even more articles at http://thewebdev.info/

Regular expressions are entities that let us look for patterns in strings. It’s a very useful too for searching for strings that has what we want or checking if a string has certain patterns for validation purposes. Some use cases for regular expressions include checking if an email input is typed in in the correct format, finding and replacing parts of a string with a certain pattern, locating capitalized words, or finding strings that look like phone numbers, etc.

Here is a simple example of a regular expression:

[^@]+@[^\.]+\..+

The one above is a simple regular expression for identifying email address. It looks for an at sign, and characters before and after the at sign and check if there are periods separating texts in the substring after the at sign.

To write regular expressions in JavaScript, we either write them as a literal like the one we have above, put a string of the regular expression into the constructor of the RegExp object. When we create a regular expression object using the RegExp object, it runs at run time rather than web the script loads like a regular expression literal. We can create a RegExp object as follows:

const emailRegex = new RegExp('\[^@\]+@\[^\\.\]+\\..+', 'g');

The first argument of the constructor has the regular expression of the email, and the g means that the regular expression will search for substrings globally.

With regular expressions, we can search for strings in a case-insensitive manner. A simple example would be looking for an word with regular expressions in a case insensitive manner. For example, if you want to look for the word ‘dog’ in a case sensitive manner, you can write /[Dd]og/ to look for both the word dog all lowercase and also Dog which starts with upper case. This will let us looks for different cases of a word. A regular expression that looks for a particular word is called a simple pattern.

We can test for regular expressions in the JavaScript console with the test method which is available for regular expression literals and RegExp objects. For example, to check if a string is an email, we can write:

const emailRegex = new RegExp('[^@]+@[^\.]+\..+', 'g');
emailRegex.test('abc@123.com') //  true
emailRegex.test('abc') //  false

For regular expression literals, we can use it the same way:

/[^@]+@[^\.]+\..+/.test('abc@123.com') // true
/[^@]+@[^\.]+\..+/.test('123') // false

Characters for Building Regular Expressions

As we can see regular expressions uses their own characters and rules to search for strings. With it we can look for letters and strings and for specific patterns in them. Below is a list of regular expression characters that we can combine to form regular expressions that we can use to look for specific things in strings:

  • \ — escape special characters so that we can look for the character that comes after the slash in a regular expression
  • ^ — finds the beginning of a string
  • $ — finds the end of a string
  • * — finds the character before it 0 or more times
  • + — finds the character before it 1or more times
  • ? — finds the character before it 0 or 1 time
  • . — finds single character except the new line character
  • a|b — finds a or b
  • {x} — finds exactly x occurrences of the character preceding this
  • [abc] — finds one or more of the characters in the brackets
  • [^abc] — finds any character other than the characters in the brackets
  • [\b] — search for backspace
  • \b — find word boundary
  • \B — find non word boundary
  • \d — find digit character
  • \D — find non-digit character
  • \n — find line feed character
  • \s — find single white space character, which also includes space, tab, form feed, and line feed.
  • \S — find single non-white space character
  • \t — find tab character
  • \w — find any alphanumeric character including underscore
  • \W — find ant non-alphanumeric character

Writing Code with Regular Expressions

Regular expressions have modifiers to let us modify the scope of the search. We pass modifiers as the second argument of the RegExp constructor or appended at the end of the regular expression literal. There are 3 modifiers for regular expressions in JavaScript are g for global, which means the whole string is searched for anything that matches the regular expression. i means case insensitive, this means that the regular expression search will ignore the case when it searches for things that match the regular expression. m is for multi-line matching. This means that it’ll let the starting and end regular expression characters (^ and $) work over multiple lines. It’ll match the beginning and end of each line rather than the whole input string.

There are 2 methods that are available for regular expression objects in JavaScript. One is test, which checks if a string matches the regular expression object that the test method is called on. It returns true if matches of the regular expression are found and false otherwise. There’s also the exec function which finds matches for the regular expression the method is called on and then returns the matches in an array.

So if you want to check if a string matches a certain regular expression pattern, then use the test function. If you want to find strings that match a given regular expression pattern, we use the exec function.

There are a few string functions that take regular expressions as arguments. They are the match , search , replace , and split functions. The match function gets that matches in a string that matches the regular expression object you pass in as the argument. null is returned if there are no matches, otherwise, an array is returned with the information about the matches. For example, if we have the following string, then we can use the match function to find email addresses:

'abc@123.com abc@abc.com'.match(/[^@]+@[^\.]+\..+/g)

In the code above, we search for email address globally in the string, and an array of substrings that matches the regular expression is returned.

The search function searches for substrings with the given regular expression. If a match is found, then it returns the index of the match, otherwise, -1 is returned. For example, if we use search for searching for emails in the string above, then we can write:

'abc@123.com abc@abc.com'.search(/[^@]+@[^\.]+\..+/)

Then we get 0 since the first substring starting with the first character is already an email address.

The replace method searches for a substring matching the regular expression given in a string and then replace the substring with the one you specify. For instance, we can use replace like in the following code:

'abc@123.com abc@abc.com'.replace(/[^@]+@[^\.]+\..+/g, 'email')

We get 'email' as the returned string since the replace method searches for the regular expression, then replaced it with 'email'.

The split method can split a string using a regular expression as the delimiter. For example, we can write:

const str = 'The 1 quick 2 brown fox jump.';
const splitStr = str.split(/(\d)/);

console.log(splitStr); // gets ["The ", "1", " quick ", "2", " brown fox jump."]

to split a sentence into an array of strings with the numbers separating the words.

Common Regular Expressions

Below are some common regular expression patterns. Common ones include username length, passwords, URLs, IP addresses, and HTML tags.

Usernames usually contain a minimum amount of characters and some characters that are required and others that aren’t allowed. For example, we can only allow usernames that are 5 characters or longer and alphanumeric characters, underscore and dashes only by using /^[a-zA-Z0–9_-]{5,}$/. This regular expression allows uppercase and lowercase letters, underscores, dashes, and digits.

Passwords are another thing that often have rules that restrict how we set them. We can use the following regular expression to check if a password has 10 or more characters uppercase and lowercase letters, underscores, dashes, and digits: /^[a-z0–9_-]{10,}$/.

Web URLs are can get quite complex and matching them may be a problem. They can contain multiple optional parts like http , www , and periods separating multiple parts. We can use something like:

/^(http(s?)?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

It matches both HTTP and HTTPS and alphanumeric characters separated by periods.

IP addresses are another common thing that needs to be validated. One example is /^(?:(?:25[0–5]|2[0–4][0–9]|[01]?[0–9][0–9]?)\.){3}(?:25[0–5]|2[0–4][0–9]|[01]?[0–9][0–9]?)$/ from https://code.tutsplus.com/tutorials/8-regular-expressions-you-should-know--net-6149 which validates that a string is an IP address by checking if each number of the IP address is between 0 and 255 separated by dots, and that there are only 4 sets of numbers.

HTML tags often needs to validated by things like text editors and browser. The regular expression /^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/ checks for both self closing and normal tags by checking for the less than sign of the opening tag along with some letters inside, then it checks that the greater than sign closes the opening tag. Then it checks for optional content with (.*) , then it checks for the closing tag with <\/\1>|\s+\/>) . For self closing tags like img, br, or hr tags, we search for spaces follow by the /> directly.

Regular expressions are very handy when we need to search for and manipulate strings. We can use it to simplify a lot of string manipulation and validation, especially when we need to validation complex patterns like emails and URLs.

Top comments (0)