DEV Community

Cover image for Cracking into Regex
Elise Marie LaBonte
Elise Marie LaBonte

Posted on • Updated on

Cracking into Regex

Regex (regular expressions) are snippets of code used to search for specific characters. In this tutorial, we'll explore how to write regex using an example I've written which passes latitude and but not longitude.

Regex is one of those topics that makes my eyes glaze over as soon as I look at the docs, but I can also say that there are few things more satisfying than having a successfully written block of regex. Not only can it shorten the amount of JS that you write in your functions, there are just some things that can't be done without it.


Below is a regex which will pass latitude written in a variety of different formats, but which will not pass longitude.

/(?<coordinates>\d{2}[d°\.\-]\s?\d{2}[m'\.\-]\s?\d{2}[s\"\.\- ]\s?)(?<hemisphere>(north)|(south)|(n|s))$/i


Here is a break down of this regex.

<coordinates> and <hemisphere> both mark the "names" of their respective blocks within the regex.


The first group, named 'coordinates' matches two digits, directly followed by a specified symbol, followed by a whitespace, followed by the same pattern repeated twice. The only difference between each repetition of this pattern is that the symbol following the digits can be a D in the first (degrees), an M in the second (minutes), and a S in the third (seconds). The second group, named 'hemisphere' matches different options that signify the southern or northern hemisphere.


Navigate through the table of contents to read an explanation of each piece of this regex.

Table of Contents

Regex Components

Anchors

Anchors specify where the engine searches. They are special characters that do not match any characters being searched. A ^ is used just before a character/group to specify that it comes at the beginning of an expression, while a $ is used directly following a character/group to specify that it comes at the end of an expression.

In our example code, the \$ at the very end means that our coordinate must end in one of the following options: n, N, s, S, 'north', 'south', 'North', 'South'.
ex.
((north)|(south)|(n|s))$

This ensures two parameters:

  1. a coordinate which does not specify a hemisphere will not match.
  2. a coordinate which ends in an "east/west" (i.e. longitude) signifying hemisphere will not match. ### Quantifiers Quantifiers describe how may of a certain character to search for in a group. The following symbols are used as quantifiers: * (search for zero or more occurrences), + (search for one or more occurrences), ? (search for zero or one occurrence), {n} (search for n number of occurrences), {n, z} (search for occurrences numbering between n and z).

In our example, we used quantifiers to specify the number of digits at the degree, minute, and second position in the latitude. (This also aids us in ruling out some longitude inputs, since longitude is often expressed with a three-digit degree.)
ex.
\d{2}

Grouping Constructs

Items that match a search can be grouped and named. Parentheses help us with this. A name is not necessary for a group, but it can be specified by using the following syntax: (?abc). If no name is specified, the groups will be named numerically by default.

In our example code, the first group is named 'coordinates' while the second is named 'hemisphere'.
ex.
(?\<coordinates\>.....)

(?\<hemisphere\>.....)

Bracket Expressions

Bracket expressions treat the items inside as individual characters. The engine matches any individual character within the brackets. So, a search of [fc]at would match the words fat and cat, but not sat.

In our example code, we use brackets to match some of the different characters that are used to express longitude, as well as some of the different ways that a north/south hemisphere can be expressed.

ex.
[dD°\.\-\s]

Character Classes

Character classes are indicated using a bracket expression. They can specify a single character or a range of characters. For example, [a-j] would match any letters between a and j in the alphabet, and [2-7] would match any digits between 2 and 7.

The OR Operator

A '|' can be used to match expressions on either side within a set of parentheses. A single character or multiple can be compared here.

In our example code, we use the OR Operator to specify a hemisphere from a few different options.
ex.
(north)|(south)|(n|s))

Flags

Flags are used at the end of a regex to specify search options. The syntax for flags is a '/' followed by one of the following options: g (global), m (multi-line), u (unicode), (to name a few).

In our example code, we use a /i at the very end of our regex which allows for a case-sensitive search.
ex.
(north)|(south)|(n|s))$/i

Character Escapes

When a character must be used which has a functional meaning within a regex, we can use a character escape. These are represented by a '\' or backslash. It is important to note that a '\' specifies a character escape, while a '/' or slash specifies the end of a regex.

In our example code, we use character escapes to specify both digits (\d) and white space (\s).
ex.
\d{2}
[mM'\.\-]\s?

Author

Elise LaBonte is a developer based in Weymouth, Massachusetts. She has been coding since 2020.

https://github.com/eliselabonte

Top comments (0)