DEV Community

Ben Halpern
Ben Halpern

Posted on

How do you regex?

I made a thread yesterday called How do you feel about regex?

Lots of fun discussion. Lots of people weighed in on their tools and tactics, but I'd love to move the conversation that way.

How do you go about using regex.

  • What tools do you use? (if not from full memory)
  • How do you encapsulate/label/comment regex?
  • What types of problems do you most solve with regex?

Looking forward to any and all comments!

Discussion (24)

Collapse
dan_starner profile image
Daniel Starner

For building & explaining regular expressions, I always use regex101.com/. It's a great tool that really break down how your regular expression will work, which is very useful if it can get complex. It also has multi-language support to match the regex flavors across different major languages & distributions.

For learning regex, I usually just Google and dive through StackOverflow posts similar to what I'm trying to do. If I write a regex that is more complicated than just some alphanumeric grouping, I try to comment it with some human text such as capture the ID from the post slug or something of the sorts...not very in-depth, but 😅

Regular expressions are most commonly used when I am validating string structure, which could be form inputs, hostname information, or CSV data. If I am parsing strings, my first go-to for basic cases is to do some combination of .split() and .join on the resulting array, as its a bit easier to reason about and requires no understanding of regular expressions, just some basic logic. I tend to fall back to regular expressions if the string parsing is more complicated, has more than one match, or requires some more advanced substitution.

Collapse
ben profile image
Ben Halpern Author

For learning regex, I usually just Google and dive through StackOverflow posts similar to what I'm trying to do.

Just curious, have you ever considered taking on a dedicated approach to learning Regex — like a book or course?

Definitely not a judgment, just a curiousity.

Collapse
dan_starner profile image
Daniel Starner

No judgements found!

I tend to learn on the fly as I'm trying to implement something, so it probably means I am going to immediately use the idea if I am learning something new concerning regular expressions. This is also predicated on the fact that I've been using them with regular consistency for 4-5 years so I have a firm foundation in the basics. If I am looking something up, its usually related to Named Groups (I always forget the exact syntax), Lookarounds, and Conditions. The latter two only get used in very rare & complex scenarios, so I don't bother committing them to memory fully.

When I was just getting started, I just spent a day on regex101.com/ trying to build different expressions to match different strings to see how it would work. This learning was compounded by the fact that Django used to exclusively use regular expressions for route matching, so it "forced" me to work with them.

Collapse
derekenos profile image
Derek Enos • Edited on

I usually work from memory and try to use as many named capturing groups as possible because I find that it serves to provide basic, inline, documentation of the pattern itself, and provides a more expressive way of accessing the groups on the match result:

const regex = /(?<first>[^\-])-(?<second>[^\-])-(?<rest>.+)/

const { groups } = regex.exec("1-2-3-4-5")

groups.first
'1'
groups.second
'2'
groups.rest
'3-4-5'
Enter fullscreen mode Exit fullscreen mode
Collapse
jeremyf profile image
Jeremy Friesen

For tools, I like to use Ruby to write my regex. I often refer to the documentation. I invariably pair the regex with lots of local tests.

Below is a basic test. I save the code to a file (e.g. regexp.rb) and then run ruby regexp.rb.

THE_REGEX = %r{\d+}
def the_matcher(text)
  THE_REGEX.match?(text)
end

if __FILE__ == $0
  [
    ["123", true],
    ["abc", false],
  ].each do |text, expected|
    actual = the_matcher(text)
    if actual == expected
      puts "SUCCESS: for #the_matcher(#{text.inspect})"
    else
      puts "FAILURE: Expected #the_matcher(#{text.inspect}) to return #{expected}, got #{actual}"
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

I favor these basic tests as I'm writing regular expressions as they are super fast to run. And are easy to later port into the test suite of the application (which depending on how the application tests are constructed might be faster or slower to run).

For types of problems, I've used it for:

Collapse
inhuofficial profile image
InHuOfficial • Edited on

Well I did use regex quite heavily to build a basic JS syntax highlighter (instead of doing it properly with tokenisation)

Other than that I just tend to use them for validation of input!

Collapse
aleksandrhovhannisyan profile image
Aleksandr Hovhannisyan

I like using regexr.com/ to test my regex and make sure it behaves correctly for edge cases. It's a wonderful tool!

These days, I mainly use regex at the editor level (e.g., in VS Code) to mass-replace certain patterns with other patterns when it's not possible to easily rename them using built-in editor shortcuts. For example, in early 2021, I migrated my site from Jekyll to 11ty, and as part of that migration, I had to convert a bunch of my Liquid shortcodes to use a new syntax. Since I had hundreds of matches, I relied on regex to mass-replace them rather than doing it by hand.

More recently, I also learned about the HTML pattern attribute, which accepts any valid regex to validate a form input, and have been using it where appropriate for client-side validation. For example, in a recent project, I used the pattern ^[a-zA-Z0-9-](?:(,\s*)?[a-zA-Z0-9-])*$ to match a comma-separated list of identifiers, with potential spaces after the commas. It seemed difficult to arrive at this solution initially, but then I realized that it was just a more complex case of the slug regex pattern I used here: npmjs.com/package/is-slug.

While I find it easy to compose basic regex patterns, I do think it's harder to read regex, especially for complex patterns like the one above, and especially if I'm reading other people's regex.

Collapse
ben profile image
Ben Halpern Author

For Ruby regex I've always used Rubular as a lightweight reference tool. I do much more from knowledge/memory than I used to, but I'm still mostly a user of search and tools like this for anything complicated.

Collapse
lexlohr profile image
Alex Lohr

I just write them down; ,usually they work. I comment more complex RegExp by splitting up the parts of it in a comment, e.g. for a simple example:

// Matches cookies in document.cookie
// $1: key `([^=]+)` one or more not `=`
// separator `=`
// $2: value `([^;]+)` one or more not `;`
// Modifier 'g' (global)
const cookieMatcher = /([^=]+)=([^;]+)/g;
Enter fullscreen mode Exit fullscreen mode
Collapse
piyushkmr profile image
Piyush Kumar Baliyan

I use Regexr (regexr.com) to quickly test out my regexes. It supports JS style regex. and is very handy since it has features like live preview, and explanation of your regex.

For explaining regex, I mostly try to create simple and small regex, and assign it to some const (or function) to make it understandable.
e.g. I'll never do if (/regex/.test(str)), but always const EMAIL_REGEX = /regex/, and then my code.

For solving problems, I use it to sometimes modify data structures on the fly, and code refactor.
I also recently wrote a post about it: dev.to/admitkard/regexp-cheatsheet...

Collapse
pinotattari profile image
Riccardo Bernardini • Edited on
  • Tools? None, most of the times. For most complex regex, I wrote myself a nice "regexp generator" library that allows me to "build" regexp in more readable way.
  • Problems? It depends on the language, actually. In Ruby I use them mostly to parse simple line-based text files. In Ada I use them more sparingly, usually when I need some kind of lexical analyzer. The reason for this difference is that in Ruby using regexp is much easier: just write something like /[a-z][0-9]+/, in Ada it is a bit more involved.
Collapse
drsensor profile image
drsensor
  1. ask copilot
  2. got dumb answer? so $askagain
  3. test that! curl_cat_whatever $something | rg -n -w $regex
  4. bug? visualize regex as railroad diagram (I'm still searching CLI similar to npeg but for regex while being a stand-alone binary)
  5. ditch regex and just use PEG 😂
Collapse
highcenburg profile image
Vicente Reyes

I wrote(WIP) an article which I shared with my colleagues which explains regex. These colleagues have little to no knowledge on how regex works hence the urge to write and help them get started using it since we're the last people who work on clients' project before giving it back to them. Our work includes making sure the data's clean and consistent. notion.so/vicentereyes/Introductio...

Collapse
mellen profile image
Matt Ellen

I mostly use regex for searching files for something. A string I sort of remember, but not quite, or if I want to double check the "find all instances of" type thing in an IDE.

For whatever reason I find grep easier to use than "find in files" of most IDEs.

I don't usually have to look up how anything works, because I've learnt that now, with the exception of look ahead and look behind, and sometimes I forget which is the end anchor and which is the start anchor.

The problem I hit most often is forgetting what I have to escape. For example, what I have to escape in emacs is different to what I have to escape in bash.

Collapse
thumbone profile image
Bernd Wechner

My contexts for RE us in order of frequency of late:

  • Python
  • Bash
  • .NET (C#)

What tools do I use? In order again:

  • My IDE (as in I just write the thing, been writing Res since the '80s so pretty familiar with them)
  • The documentation for the tool (because different flavours trip me up from time tot time of course)
  • General on-line search (which generally takes me to the first)
  • On-line testing and diagnostic tools if I can't work out why my RE isn't matching when I think it should, including:

What types of problems?

  • RE problems, doh! ;-).
  • More seriously, the class of problem REs are idea for, which includes primarily:
    • Any spot need in Python or C# that I have to detect or extract specific patterns in a string
    • Using CLI tools in bash or writing shell scripts, it's not long before and RE in a grep or sed or such is called for.

Yeah, could be an age or generation thing but I call them REs not regexes so much.

Essentially REs fill the gap between:

  • The very basic string find, extract and split tools that many languages provide like like Excel does for example and basic string types in many modern languages provide
  • Full on grammar parsing.

In between these two extremes is a rich territory of spot pattern testing and string manipulation that a terse pattern definition language provides and the one that essentially came to dominate is called "regular" ;-), probably mainly because in its earliest inceptions it was designed and intended to be, supported by diverse tools in the *nix landscape of the day.

Collapse
xanderyzwich profile image
Corey McCarty

If at all possible, avoid using regex in code. If you must include regex in code then you should make efforts to explain what the different parts do. All of that said, I have come across some examples of entry validation in front-end where the best approach was to request the validations from the back-end depending on the locale, and then pass back regex strings. This keeps you from having hard coded regex in the front-end, but means that the back-end has a plethora of regex patterns stored based on the field being validated and the locale. For this think of a list of phone number validation regexes by locale code being in one place. This is an attempt to prevent sending information to the back-end to perform validation, but as a safety precaution, the back-end should still perform the validation to insure that there's been no tampering with the validation steps.

When I use regex it is usually for searching in my code or local file system, and I use Regexr as both a reference and a test tool. One instance that I used during Advent of Code was to find any commented/uncommented print calls in the code before commit.

Collapse
scottshipp profile image
scottshipp • Edited on

Some people, when confronted with a problem, think 'I know, I’ll use regular expressions.' Now they have two problems.

—Jamie Zawinski

Hence, I only use regex when I have to. And I usually just end up using the built-in language features for it.

Collapse
baenencalin profile image
Calin Baenen
  • What tools do you use? (if not from full memory)

regex101, as stated yesterday.
Otherwise I try to do things from memory, or look shiz up.

  • How do you encapsulate/label/comment regex?

I don't.

  • What types of problems do you most solve with regex?

ParseJS with "abstract tokens", that'd allow for you to make a programming language with usable identifiers.

Collapse
lepinekong profile image
lepinekong • Edited on

My approach is to use iterative method to do regex decomposition and I write down the mental process to achieve this or I will just learn and forget, just learn and forget "Ad vitam Eternam" like I used to in the past :D example below - I'm using figjam document to create code notes (free with figma.com) to do this (with the help of a plugin I'm building to generate the whole expression from the parts) - especially important to be able to understand a regex you wrote x weeks or months before so I also not matching and non matching samples for above each regex part. I also embed regexr.com/ playground in figjam doc. In the future by improving the plugin I will be able to have direct real time playground while playing with the parts.

That's how I'm now much more confident to write my own regex without googling which is by the way an increase in productivity ;)

Alt text

Collapse
sherrydays profile image
Sherry Day

I like this approach from the other thread

I love using regular expressions. And try to adopt the following pattern:

  1. Write a method for evaluating the regular expression.
  2. Bombard that method with tests, both matching and non-matching cases.
  3. Profit!

Regular expressions can be quite dense to read, so I want to use the tests to highlight my expectations around what I wrote.

Collapse
waylonwalker profile image
Waylon Walker

I regex in all the terminal dev tools vim, nvim, grep, ag, rg.

Collapse
devfranpr profile image
DevFranPR

This VSC plugin with a regex rules on another window is a life saver marketplace.visualstudio.com/items...

Collapse
citizen428 profile image
Michael Kohl

I never use them myself but I had good experience with using VerbalExpressions to get people into regex.

Collapse
rafi993 profile image
Rafi

I use cli tool grex it generates regex given example text to match.