DEV Community

loading...

Basic Regex

Kevin Downs
I'm a Philadelphia based Full Stack Software Engineer. I'm a Linguist turned Full Stack Engineer with a background in Healthcare and Customer Service looking for opportunities as a Junior Developer.
・6 min read

Regex, short for regular expression is a useful tool for searching out patterns in strings. They can be used in string searching methods such as find() and replace() as well as in input validation to match a specific pattern. If you are like me, you may have come across regex before while trying to manipulate substrings and got scared off by seemingly confusing syntax. Well good news! Regex isn't nearly as complicated as it looks, and is a great tool for writing clean and concise pattern matches when working with strings. Below I'm going to lay out the basics of regex in a hopefully simple to understand manner.

Note: I will be using JavaScript for the purposes of this post, though the concepts can be used in almost any language. Also, this guide will only focus on the basics of regex so I will not be talking about more advanced patterns like look aheads and capture groups.

Methods - test() and match()

The first thing I want to talk about are the two methods that I will be using - test() and match(). You can use regex in a wide variety of built in string methods, but we're going to keep it simple today. Both of these methods are called on a string and take a regex pattern as an argument. The main difference between the two is the return value.

test(), as the name implies, tests a regex pattern against a string and returns true if it finds a match and false if it does not.

match() is very similar, except it returns an array of the matched substrings if a match is found and null if not.

let regex = /Hello/;
let string = "Hello";

string.test(regex);  // true
string.match(regex);  // ["Hello"]

Note that regex patterns can be either stored in a variable or just input directly as an argument. I think storing them in variables looks cleaner so I will be using them that way in this guide.

Literal Patterns

The most simple pattern you can match for is a literal one. You can see an example of this in the code snippet above where we are just matching for the string "Hello". To create a literal regex pattern all you have to do is put the word you want to match inside //.

let regex = /javascript/;
let string = "I am a javascript programmer.";

string.test(regex);  // true

As you can see above, we are checking to see if the substring "javascript" exists within the string "I am a javascript programmer". Pretty simple right? Let's get a little bit more complicated. What if we had multiple different languages we wanted to check for? We could use the "or" symbol | to test if any of the languages we specify are within the string we want to test. If we use it with match() instead of test we can also get the specific value that was matched.

let regex = /javascript|ruby|java/;
let js = "I am a javascript programmer.";
let ruby = "I am a ruby programmer.";
let java = "I am a java programmer.";

js.match(regex);  // ["javascript"]
ruby.match(regex);  // ["ruby"]
java.match(regex);  // ["java"]

Flags - i and g

So far we have some very basic literal patterns that we can match. This is great, but regex is case sensitive and only returns the first match found. Often we will want to match regardless of case and we will want to get all of the instances of our match. This is where regex flags come in. They can be added to the end of a regex pattern to denote rules for the entire pattern.

Two of the most commonly used flags are i to denote case insensitivity and g to denote that you want every match in the string. It is also possible to combine flags together to denote multiple rules on your pattern.

let string = "The fox jumps over the dog at the park.";

// This pattern will return the first case insensitive match
let caseRegex = /the/i;
string.match(caseRegex);  // ["The"]

// This pattern will return all case sensitive matches
let multRegex = /the/g;
string.match(multRegex);  // ["the", "the"]

// Combined will return all matches regardless of case
let caseMultRegex = /the/ig;
string.match(caseMultRegex);  // ["The", "the", "the"]

Wildcard - .

Now that we have covered literal patterns and flags, lets start talking about special characters. This is where the power of regex starts to shine. In a pattern we can use the . in order to represent a wildcard. This . is a stand in for any character. Say you wanted to match for any three letter word that starts with "b" and ends with "g". Take a look at the snippet below to see how we could use this.

let regex = /b.g/;
let bugString = "Look at this bug";
let bagString = "Look at this bag";

bugString.match(regex);  // ["bug"]
bagString.match(regex);  // ["bag"]

Multiple Characters - [], -, +, *, and {}

Now that we've seen the simplest special character - the wildcard, lets talk a bit about some other special characters. The characters that we talk about in this section will allow us to select multiple characters in some for or another.

Surrounding a set of characters with [] will match for any of the characters within. This can be useful for example if you want to find all of the vowels in a string.

let vowels = /[aeiou]/g;
let string = "Hello World!"

string.match(vowels);  // ["e", "o", "o"]

The - character can be used inside of [] to denote a range of characters that we want to match. Say for example we want to match all of the numbers in a string.

let numbers = /[0-9]/g;
let string = "The value of pi is 3.14";

string.match(numbers);  // ["3", "1", "4"]

The + and * characters are very similar in that they both allow you to specify if a specific character appears in succession. + will specify that the character appears one or more times in succession while * specifies zero or more times. Lets look at some examples to clarify.

// This pattern specifies one or more
let regex = \s+\g;
let string = "Where is Mississippi?";

string.match(regex);  // ["s", "ss", "ss"]

// This pattern specifies zero or more
let regex = /ya*/g;
let string = "I said yaaas yesterday.";

string.match(regex); // ["yaaa", "y"]

The final symbol I want to talk about here is {}. It is similar to + and * except that it allows you to specify a range or exact number of times you want a character to repeat. You can specify a min, min and max, or exact number.

let timidPirate = "Aargh";
let normalPirate = "Aaaargh";
let excitedPirate = "Aaaaaaaaaaaaaargh";

// Specify exact number - we want a normal pirate
let regex = /a{4}/i;

timidPirate.test(regex);  // false
normalPirate.test(regex);  // true
excitedPirate.test(regex);  // false

// Specify minimum number - we don't want any timid pirates
let regex = /a{4,}/i

timidPirate.test(regex);  // false
normalPirate.test(regex);  // true
excitedPirate.test(regex);  // true

// Specify min and max number - we only want timid and normal pirates
let regex = /a{2,4}/i

timidPirate.test(regex);  // true
normalPirate.test(regex);  // true
excitedPirate.test(regex);  // false

Shorthand - \w, \d, and \s

Sometimes we want to be able to be able to specify a group of characters, say all digits. Regex provides us with a few shorthand characters that allow us to do so in a single character.

\w allows us to match any alphanumeric value and includes underscore. Its inverse \W matches for all values except alphanumeric and underscore.

\d matches for all digit values (0-9). Similarly \D matches for all non digit values.

\s matches for all whitespace values (spaces, tabs, line breaks). You can probably guess that \S matches all non whitespace values.

let string = "I am 31!";

// Alphanumeric and non alphanumeric
let regex = /\w/ig;
let antiRegex = /\W/ig;

string.match(regex);  // ["I", "a", "m", "3", "1"]
string.match(antiRegex);  // [" ", " ", "!"]

// Digit and non digit
let regex = /\d/ig;
let antiRegex = /\D/ig;

string.match(regex);  // ["3", "1"]
string.match(antiRegex);  // ["I", " ", "a", "m", " ", "!"]

// Whitespace and non whitespace
let regex = /\s/ig;
let antiRegex = /\S/ig;

string.match(regex);  // [" ", " "]
string.match(antiRegex);  // ["I", "a", "m", "3", "1", "!"]

Conclusion

That's really all there is to basic regex. With the tools I talked about here you can begin to mix and match to start creating your own pattern matches. There are some a few more concepts that are a bit more complicated, and if you'd like to continue exploring the topic of regex I'd encourage you to take a look at them for even more powerful pattern matching.

Resources for more learning:
Learn Regular Expressions (Regex)
RegExr: Learn, Build, & Test RegEx

Discussion (1)

Collapse
mynamejs97 profile image
Arshit

This basic information is much useful to get started with regex.