loading...
Cover image for Explain Regular Expressions Like I'm Five

Explain Regular Expressions Like I'm Five

savvasstephnds profile image Savvas Stephanides Updated on ใƒป5 min read

About

Browsing Twitter, especially in the #100DaysOfCode and #CodeNewbie hashtags, you'd be sure to soon find someone struggling with Regular Expressions, or "regex" - and for good reason. Even experienced software developers are on the same boat. I'm with you. Regex, still makes me dizzy even after years of using it.

Thus, here is my attempt at an "Explain Like I'm Five" for regex:

Okay kids, let's begin.

Regular expressions are a way of finding specific parts of something written. A bit like finding a specific part of a story book, or a certain word in a song.

Actually let's do this now: let's begin with a random song:

Twinkle twinkle little star,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!

Let's find some words:

1: Find the word "star" in the song

Twinkle twinkle little [star]๐Ÿ‘ˆ,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!

Here it is, right there! On the first line of our song. That was easy!

Now let's try something else:

2: Find every character that's not a letter!

Twinkle twinkle little star[,]๐Ÿ‘ˆ
How I wonder what you are[!]๐Ÿ‘ˆ
Up above the world so high[!]๐Ÿ‘ˆ
Like a diamond in the sky[!]๐Ÿ‘ˆ

Now that looked a little bit more complex than our first exercise. But it wasn't too difficult, was it?

The reason you found it slightly more difficult was because you weren't looking for a specific word this time. You were looking for something else more general. You were looking at a... PATTERN!

You know patterns, right? They're on the shirt you're wearing, outside on the trees and leaves. They're everywhere!

Now let's try one more:

3. Find every word in the song that is 3 letters or less:

Twinkle twinkle little star,
[How]๐Ÿ‘ˆ [I]๐Ÿ‘ˆ wonder what [you]๐Ÿ‘ˆ [are]๐Ÿ‘ˆ!
[Up]๐Ÿ‘ˆ above [the]๐Ÿ‘ˆ world [so]๐Ÿ‘ˆ high!
Like [a]๐Ÿ‘ˆ diamond [in]๐Ÿ‘ˆ [the]๐Ÿ‘ˆ [sky]๐Ÿ‘ˆ!

Whoa! Now that was quite a bit more involved wasn't it? Go ahead and try it yourself!

Code talk

Now that you're familiarised yourself with the concept of "patterns" let's talk code. For this article, we're going to be coding in Javascript, but the expressions are exactly the same in all languages!

So say, you need to express some complex patterns in code.

Find the word "star"

Firstly, let's find the word star in the "Twinkle Twinkle Little Star" song, and replace it with "โญ". You probably already know how to do this. It's quite simple:

First let's store our poem as a variable:

var poem = `Twinkle twinkle little star,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!`

Now let's replace our text using the replace() function:

poem = poem.replace("star", "โญ")
console.log(poem)

This will be the output:

Twinkle twinkle little โญ,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!

Hurray ๐ŸŽ‰๐ŸŽ‰. Just what we need!

Find every capital letter in the song

Now we're starting to look for patterns, not just certain words. We could possibly iterate through every letter in every word and compare it to every capital letter in the English alphabet, but that's painful to even think about. Let's instead use a magical tool called REGULAR EXPRESSIONS!

Basically you need a way to tell your application "find any letter between A to Z (capitals)". The regular expression to express this is this:

[A-Z]

That's it! Now let's use Javascript to replace every capital letter with a "โค๏ธ":

var poem = `Twinkle twinkle little star,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!`

poem = poem.replace(/[A-Z]/g, "โค๏ธ")
console.log(poem)

And here's the output:

โค๏ธwinkle twinkle little star,
โค๏ธow โค๏ธ wonder what you are!
โค๏ธp above the world so high!
โค๏ธike a diamond in the sky!

Find every small letter in the song

In the exact same way, we can find all small letters, but the expression this time is this:

[a-z]

Let's use Javascript to replace all small letters with "๐Ÿถ":

var poem = `Twinkle twinkle little star,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!`

poem = poem.replace(/[a-z]/g, "๐Ÿถ")
console.log(poem)

Output:

T๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ,
H๐Ÿถ๐Ÿถ I ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ!
U๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ!
L๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ ๐Ÿถ๐Ÿถ๐Ÿถ!

I hope these make sense by now.

A couple of notes

Before we continue to our final example, let's clarify a few stuff:

  • Notice how the letters in th regular expression are inside square brackets []? In regex, this simply means "any character from series of characters":

    • [A-Z] means any letter A-Z
    • [a-z] means any letter a-z
    • [0-9] means any number 0-9
    • [A-Za-z0-9] means any character, either capital letter, small letter or number
  • Notice how in the Javascript code, the regex starts with / and ends with /g? This simply means "find everything in the text" (rather than just the first instance). There are more you can use. For example /i means the search is "case-insensitive".

Final example: Find words that are 3 letters or less and replace them with "๐Ÿ•".

This is more complex, but I'll explain. The expression for this pattern is this:

\b[A-Za-z]{1,3}\b

I can see you shaking your head and gasping so let's break this down:

  • First, the familiar territory. Notice the [A-Za-z] there? If you remember, this means any letter capital or small. So far so good right?
  • Next to it, you see {1,3}. This simply means the pattern before it should be repeated between 1 and 3 times. Basically anywhere 1 to 3 letters appear next to each other. So, the words we need!
  • Lastly, there's \b in each end. This simply means "word boundaries". In other words, ignore half-words that happen to contain 1 to 3 letters in them.

In summary, the pattern above basically means: "Find characters that have 1 to 3 capital or small letters, that are surrounded by word bounderies". Exactly what we need.

Let's now use Javascript to replace these small words with "๐Ÿ•"!

var poem = `Twinkle twinkle little star,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!`

poem = poem.replace(/\b[A-Za-z]{1,3}\b/g, "๐Ÿ•")
console.log(poem)

And here's the output:

Twinkle twinkle little star,
๐Ÿ• ๐Ÿ• wonder what ๐Ÿ• ๐Ÿ•!
๐Ÿ• above ๐Ÿ• world ๐Ÿ• high!
Like ๐Ÿ• diamond ๐Ÿ• ๐Ÿ• ๐Ÿ•!

๐ŸŽ‰๐ŸŽ‰ WHOOP WHOOP! ๐ŸŽ‰๐ŸŽ‰ We made it!

That is all for now

I hope all this makes sense. I've only scratched the surface because there's a WHOLE lot more to regex, but I hope the basics make sense enough to get you started. Let me know how you found this article and happy regexing!

To learn more about regular expressions, here's a very useful cheat sheet.

Posted on by:

savvasstephnds profile

Savvas Stephanides

@savvasstephnds

Coding as a relationship between human and human, not human and machine. Fitness enthusiast

Discussion

markdown guide
 

"Find every character that's not a letter" isn't just those 4 characters:

var poem = `Twinkle twinkle little star,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!`

poem = poem.replace(/[^A-Za-z]/g, "โค๏ธ")
console.log(poem)

/*
result:
Twinkleโค๏ธtwinkleโค๏ธlittleโค๏ธstarโค๏ธโค๏ธHowโค๏ธIโค๏ธwonderโค๏ธwhatโค๏ธyouโค๏ธareโค๏ธโค๏ธUpโค๏ธaboveโค๏ธtheโค๏ธworldโค๏ธsoโค๏ธhighโค๏ธโค๏ธLikeโค๏ธaโค๏ธdiamondโค๏ธinโค๏ธtheโค๏ธskyโค๏ธ
*/