DEV Community

Regex for lazy developers

Ilya Ermoshin on January 04, 2023

Regular expressions are a text processing system based on a special pattern notation system. Simply put, it provides programmers with the ability t...
Collapse
 
tr11 profile image
Tiago Rangel

Why all that? ChatGPT can do regexes for me! šŸ˜‚šŸ‘

Collapse
 
tlylt profile image
Liu Yongliang

I was recently toying around with making a regex-based Markdown parser and went to ChatGPT for the rescue when the pattern got complex. While I was amazed at how good (and fast!!) ChatGPT can come up with a solution that fits the description...and how well I was able to iterate with ChatGPT to correct some corner cases, I didn't managed to get a perfect regex that match all my use-cases.

I will definitely ask for ChatGPT's help on regex, but the principles are still very fundemental and I should probably invest time to learn it myself šŸ˜„

P.S. Great post @sineni! (and a nice reminder for me to spend time figuring out advanced regex syntax šŸ˜¢)

Collapse
 
arsalannury profile image
ArsalanNury

so after that **ChatGPT **can do programming for us, we should go home :))))

Collapse
 
nerro profile image
Nerro

Can Chat GPT debug, fix errors and do critical thinking?

Thread Thread
 
tr11 profile image
Tiago Rangel

Yeah he can!

Thread Thread
 
pierrewahlberg profile image
Pierre Vahlberg

Its a language model ai. It cant think, it can guess very accruately looking based on what it has read, but reasoning is not what it does. Maybe something else will in the future though

Thread Thread
 
nerro profile image
Nerro

How?
I have seen chatGPT do alot of things but not a single instance where it is fixing errors.

Collapse
 
tr11 profile image
Tiago Rangel

Haha I have tried and chatGPT's not very good at coding (no good css), but it's quite good at regexes!

Thread Thread
 
frederickprice profile image
Frederick Price

Even if it makes mistakes, you can use it as a faithful assistant, and only slightly correct him. It can easily help create a game on unity, for example. And you can create pictures in midjourney. This greatly simplifies the work.

Thread Thread
 
tr11 profile image
Tiago Rangel

That's true!

Collapse
 
cdsaenz profile image
Charly S. • Edited

Oh of course! ChatGPT! Hadn't thought of this :) That AI is really good for these snippets.

Collapse
 
deotyma profile image
Deotyma

Believe me, ChatGPT does mistakes very often ;-).

Collapse
 
tr11 profile image
Tiago Rangel

I believe you, I've seen him making up a lot of things!

Collapse
 
pierrewahlberg profile image
Pierre Vahlberg

Really nice article and really well written!

Some feedback, if you dont mind!

I find you mixed the concept Match up a little, calling it match, selection and substring interchangeably. Reads more clear if you are consistent šŸ˜Š

Its worth mentioning, IMO, modifiers. They make a great deal to regexes, the ones I use most are multiline, global and case insensitive.

Regarding regex in itself. I have used it extensively, written and debugged it a lot the past 15 years. Now, is it really a useful tool if it is hardly understood by so anyone but the Regex Master. And is it maybe used a little to often if people feel the need to learn this cryptic language?

For me something like email validation makes for a good case but a very limited usecase given that it does in fact, mostly, incur some really quirky character rules.

Syntax validation sure, a lot more specific of a task so fine.

I think that these cases are OK but solutions like regex should be avoided to the longest extent possible since veeery few can read and understand a regex in a reasonable time, much less debug it.

In your real life example above, what you received could instead have been validated something like

  • check string lengtg
  • split string into chunks by '&'
  • split each pair into key values by '='
  • validate each key and value using some config map or class

This piece of code would not require and regex skills and would be readable by anyone, even someone porting it to another language, if you dont write complete junk code šŸ˜Š the win here is code that can be read, discussed, refactored, maintained and stepped through, over compact and portable, which is a long term win I would take any day.

Hope this is not received as harsh criticism since it is my will to share my experience šŸ˜Š

Btw, a real life example from me! My colleague spent 4 work days writing a html parsing regex to read markup to render a Netlify component. Day 4 I was dragged, walked him though it and solved it within an hour. He still has no clue how it works but he does not dare to touch it again. Unfortunately there was no other option in that case since netlify required a regex to exexute at all (derp). Thats a bad solution IMO, would have preferred to use document.querySelector API for this admin code snippet šŸ¤·ā€ā™‚ļø

Keep writing and posting and thanks for an overall good read!

Collapse
 
mfurmaniuk profile image
Michael

Love RegExp and been doing them in all sorts of ways since my first foray in Perl (so the comic was especially funny to me). Nice writeup and including the Groups, which where I always get more power.

Single line ones are easy, doing multiple line ones in different languages always shows the nuances in how things work. Great writeup to get the important stuff in one place!

Collapse
 
reacthunter0324 profile image
React Hunter

Thanks for a great article!
Regex is simple and important for string validation.

Collapse
 
sineni profile image
Ilya Ermoshin • Edited

Thank you for your feedback) This is my first article in English and I was a little worries before publish šŸ˜…

Collapse
 
fruntend profile image
fruntend

Š”ongratulations šŸ„³! Your article hit the top posts for the week - dev.to/fruntend/top-10-posts-for-f...
Keep it up šŸ‘

Collapse
 
kerimedeiros profile image
Keri Medeiros

Super helpful- thanks for sharing!

Collapse
 
ravavyr profile image
Ravavyr

wow, this is for "lazy developers"? i don't wanna know what non-lazy devs would go throught lol.

Btw, i like using regex101.com to create new ones, and i've written down some of the ones i've used over the years, just to have my own mini library that i understand for my regex needs. I recommend you all do that, just to reduce how much googling you gotta do when that need for a regex comes up like once a year. I'm fairly certain most of us don't need regexes more often than that, unless you're in a particular job that happens to need them a lot.

I still don't understand half the stuff you wrote about, but it's a great write up on regex :)

Collapse
 
chivanos profile image
chivanos

Merci a vous!

Collapse
 
caribe profile image
Vincenzo Buttazzo

A shame the example was not provided in Perl, since we are talking about Perl Compatible Regular Expressions:

'testdata' =~ /^(?:8|\+7)\d{10}$/

Only in Perl (and Javascript) you can use a regexp without the need of quoting it like a string because it's part of the language.

Collapse
 
siddharthshyniben profile image
Siddharth

Cool guide! Although, in your example for search param parsing, there are actually simpler options.

You could split the string by the & and again by the = sign.

But there's an even simpler way in JavaScript

console.log(Object.fromEntries(new URLSearchParams('t=20181125T142800&s=850.12&fn=8715000100011785&i=86841&fp=1440325305&n=1')));
/* => {
  "t": "20181125T142800",
  "s": "850.12",
  "fn": "8715000100011785",
  "i": "86841",
  "fp": "1440325305",
  "n": "1"
} */
Enter fullscreen mode Exit fullscreen mode

I get that you are trying to demonstrate the usefullness of RegExp, but I thought I'd show you this so you know a simpler way to achieve this :)

Collapse
 
jassler profile image
jassler

Just skimming through the article Iā€˜m noticing a couple of mistakes:

  • \s matches [ \t\r\n\f], not just the space character
  • .lock neither matches lock nor 4Lock
  • In Condition Length you mistyped {3.5}, which should be {3,5}
  • Some of the examples seem language specific. For instance, I havenā€˜t seen (\k) before, nor do I get it to work with js. Might be good to specify, which language supports which feature.
  • The description for (?: I find strange. I wouldā€˜ve simply said that this is an uncaptured group - not sure what it has to do with logical brackets.

Otherwise good write-up. I hope I could be instructive :)

Collapse
 
eecolor profile image
EECOLOR

Thank you for writing this guide!

I would add a few suggestions:

  • Try to only use regular expressions if nothing else is available, they are notoriously hard to read and very error sensitive.
  • Always add a comment explaining what the regular expression matches.
  • Don't use them to match against 'non-regular' stuff. A good indicator when regular expressions don't work is when you need to match 'open' and 'close' characters / phrases.
  • Always consider using a parser first (for example when matching the query string, JSON, html or any other structured data). So do not match ?a=1&b=c, first parse it and after that you could check the values in { a: 1, b: 'c' } (if needed with a regex).
  • Regular expressions are very susceptible to ReDOS attacks. Before using them in production please read up on the patterns that are dangerous (Regular expression Denial of Service - ReDoS).
  • If you use a regular expression in production on text from a user, first check the length of the input.
  • Always try to match the input as precise as you can. If you need to match the id in a URL like articles/my-id do article/([^/]+) rather than article/(.*?)(/|$). But again, splitting it on / first (parsing) makes sense here.
  • If possible, anchor your regular expression at the start (using ^) to prevent ReDos trouble.

In short: be very careful when you use regular expressions in production code, they can have a big cost in the areas of:

  • Security
  • Maintainability
  • Performance
  • Readability
Collapse
 
leober_ramos33 profile image
Leober Ramos

Another use that I gave it when I was just starting to discover RegExp is to make a bot crawler that looked for information on a book download website, so I could download the books from the command line.

So another use can be Web Scraping.

Collapse
 
arsalannury profile image
ArsalanNury

it was really helpful but unfortunately for now

I can't write regex without seeing docs

Collapse
 
jwp profile image
John Peters

How would we write the expressions using multiple lines?

Collapse
 
siddharthshyniben profile image
Siddharth

Time to shamelessly plug my old package! I had this exact same problem, so I made a little node.js library which helps me write regexes in a more readable way. It's called betteregex and here's a demo:

// Comparing two ways of writing RFC2822-like email validation regex

// The normal way: small but cryptic
const emailRegex = /[a-z\d!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z\d!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z\d](?:[a-z\d-]*[a-z\d])?\.)+[a-z\d](?:[a-z\d-]*[a-z\d])?/g

// The betteregex way: longer but straightforward
const { regex } = require('betteregex')
const anythingAllowedInEmail = '[a-z0-9!#$%&\'*+/=?^_`{|}~-]+';

const emailRegex = regex`
    // Match one or more alphabet, numbers, one of allowed special characters or tildes
    ${anythingInEmailRegex}

    // Open group
    (?:
        // Match a dot if any
        \.
        // Same as before
        ${anythingInEmailRegex}
    // Close group, match one or more greedy
    )*

    // The @
    @

    // Open group
    (?:
        // Provider name (gmail etc.)
        [a-z0-9](?:[a-z0-9-]*[a-z0-9])?
        // The dot
        \.
    // Close group
    )+

    /*
        The ending extension
        May not match everything because extensions are (mostly) letters
    */
    [a-z0-9](?:[a-z0-9-]*[a-z0-9])?
${'g'}`
Enter fullscreen mode Exit fullscreen mode

As you can see, you can write regexes multiline, with spaces and even comments! You can also reuse regexes to write new regexes. The possibilities are endless here!

Collapse
 
fjones profile image
FJones

Do you want to match across multiple lines, match in a multiline string, or format the expression to be written across multiple lines (for readability, for example)?

Collapse
 
jwp profile image
John Peters

Actually I'm asking about the Regex string itself. I find long Regex patterns to be awful.

Collapse
 
clay profile image
Clay Ferguson

This site has helped me a lot in the past:

regex101.com/