DEV Community

Cover image for Why Regular Expressions Are Super Powerful, But A Terrible Coding Decision
Joseph Maurer
Joseph Maurer

Posted on

Why Regular Expressions Are Super Powerful, But A Terrible Coding Decision

We’ve all been there. You have a string input and need a fast and efficient way to parse something important out of it. Your options are relatively low since manually parsing a string for a pattern is tedious and often very inefficient as the string gets larger. So what do you do? You turn to Regular Expressions! What’s the problem with that? Let’s explore.

Impossible To Read or Debug

A huge assumption that is made when creating regular expressions is that the schema you are programming to won’t change. If it does, it could require rewriting the regular expression to hopefully produce the same usable output. But let’s say you are tasked with fixing a broken regular expression that fell victim to a changing schema. It means that you would have to first understand how the regex worked with the old schema, before understanding how the new schema changed. Only then can you rewrite the regular expression to account for the new input. That’s a fairly tedious process that is potentially very error prone. And the level of difficulty goes up exponentially with the length and complexity of regular expressions. I would hate to be the only one in charge of fixing this 6.2kb monster that validates RFC822 email addresses.

Regex Abuse

A common use case for regular expression is something like the following:

This regular expression tries to emulate a parser to rip out useful information into named capture groups from a structured data set like json. The benefit of this is that (in c# at least) you then can have reference to exactly what the regular expression matched on.

The downside of this is that you are using the wrong tool for the job. As much as it might seem like a quick and easy solution, it causes more problems than it solves. Parsing json, xml, or even html with regular expressions is a terrible idea. And it’s mostly a solved problem. Check out this HTML Python Parser. Using a tool like this will make your coding easier and make code maintenance easier in the future.

Balancing Act

I know most of this article has been bashing the use of regular expressions, but there are some benefits to using them (if used correctly). All developers and engineers should learn to use basic regular expressions, because they’ll produce better, more flexible, more maintainable code with them. When used responsibly, regular expressions are a huge net positive. For example, writing a regular expression to validate a phone number is relatively straightforward:

Conclusion

Regular expressions are extremely powerful and useful in the right situation. When abused and used in incorrect situations, they can lead to ugly and unmaintainable code. So use them wisely!

Comment below your opinion on Regular Expressions and if you use them regularly!

Top comments (8)

Collapse
 
moopet profile image
Ben Sinclair

Anything that's parsable data should be parsed with a parser. that makes sense.

But regular expressions don't need to be complicated just because they're complex; you can write them across several lines and add named groups and comments to every part. That's probably easier to maintain than a great list of individual conditions, like, if (string.startsWith('foo') && string.endsWith('bar')) { and so on.

Collapse
 
mwrpwr profile image
Joseph Maurer

Writing regular expressions is an art for sure. I’ve seen ones that are written extremely well (multiple lines, named capture groups, etc) but it doesn’t help if the input schema is always changing and requiring the regular expression to be re-written.

Collapse
 
moopet profile image
Ben Sinclair

If that's the case, then nothing helps!

Collapse
 
nicozerpa profile image
Nico Zerpa (he/him) • Edited

Great article, Joseph!

As you said, if you're using regular expressions to extract data from JSON or HTML is the wrong tool for the job. I'd like to explain why.

Regular expressions are useful when the string in which you search has a regular structure. A good example is ISO 8601 dates, like "2021-10-07". Every single string with this format will begin with four digits for the year, the next two digits represent the month, and the last two are for the day. And there'll be two hyphens separating each group.

On the other hand, JSON and HTML are not regular. In the JSON example that Joseph posted in the article, if you add another property to the JSON, the regular expression won't find anything. But a JSON parser wouldn't have any problems getting the userID property.

I use regular expressions, especially when I have to find a very specific row on a database. In my code, I use them for very simple things. For example, if I search for content at the beginning or end of the string. I feel comfortable using them.

Collapse
 
sirseanofloxley profile image
Sean Allin Newell

Your phone number is missing its country code 😛😜

I always think "regex => quick-n-dirty" whereas "parser => long-n-clean".

Unfortunately, parsers are often where there be dragons 🐉.

Collapse
 
maxart2501 profile image
Massimo Artizzu

I love regular expressions. But they're indeed a write-only language.

That's why a while ago I developed RE-build, that sooner or later I'll update with a new, non-spaghetti source code, and possibly TypeScript... 😬

Collapse
 
klayton profile image
Klayton Cavalcante

When creating a regex pattern, most of the time I use regex101.com for testing and customizing it. It's an awesome tool and helps a lot to understand the performance and the usage of each rule.

Collapse
 
grunchy profile image
Grunchy

Anybody who has issued a command such as "ls *.deb", or "dir *.exe" has used regex, or a form of it.
My criteria is 1. how fast is it to run and 2. how easy is it to pass through something undesired (how buggy is it).
My impression of regex is this is something somebody once dreamed up (wildcards * and ?) and realized they could add more stuff, and more stuff, and then just went berserk with added functionality but kept going because who's gonna stop them?
I have no idea what runs faster, string parsers or regex. Maybe it doesn't even matter anymore.