lionel-rowe

Posted on Jan 27, 2021 • Edited on Feb 1, 2021

Introducing fancy-regex - JS/TS regexes with whitespace, comments, and interpolation!

#typescript #javascript #showdev

Regexes in JavaScript are a fantastic tool for matching text, manipulating strings, performing validations, and a myriad of other tasks (just don’t try to use them to parse HTML!) Thanks to ever-improving Unicode support with the u flag and Unicode property escapes, regexes in Javascript have never been more powerful.

However, one area where JavaScript regexes have consistently fallen behind is developer experience. In Ruby, interpolation of variables into regexes is supported by default, and support for multiline whitespace indentation and comments can easily be enabled with the x flag. In JavaScript, meanwhile, you end up clumsily joining strings together using the RegExp constructor, using lots of double-escaped backslashes. Without comments or proper indentation, it’s no wonder that people say regexes are a write-only language.

That’s why I built fancy-regex! This tiny npm package uses the power of tagged template literals to provide a developer experience very similar to Ruby’s /#{regex}/x.

Let’s take a quick look at a couple of examples:

const myFancyRegex = regex`.{${4 + 1}}`

Simple enough. If you don’t need to use any flags, the regex function is directly callable on template strings.

If you do need flags, you can pass them to regex first:

const myCaseInsensitiveRegex = regex('i')`
    ^
        abc

        ${myFancyRegex}  # seamlessly interpolate other regexes

        \w\d\b\0\\       # look Mom, no double escaping! 
        ... 
        \r\n\t\x20       # use "\x20" to match a literal space
    $
`

The compiled regex here is /^abc.{5}\w\d\b\0\\...\r\n\t\x20$/i — hopefully you’ll agree that the commented and indented version is a lot more readable!

If you like, you can pass the the flags in an options object instead:

const myRegexWithOptions = regex({
    unicode: true,
    global: true,
})`
    ^
        💩+    # with unicode enabled, this matches by codepoint
    $
`

Here, the compiled regex is /^💩+$/gu.

Because fancy-regex uses the raw template string under the hood, the only things you’ll need to escape that you wouldn’t in a regex literal are backticks (`) and the sequence ${. Whitespace and hash symbols (#) must also be escaped if you don’t want them to be removed.

On the flipside, you no longer need to escape forward-slashes, which makes URL matching even easier than before!

regex`https://dev\.to/top/(week|month)`
// compiles to /https:\/\/dev\.to\/top\/(week|month)/

This is the first npm package I’ve ever published, so I’d love to hear your feedback on v0.X.X! 🧡🧡🧡

Latest comments (5)

Sundeep • Jan 28 '21

See also github.com/slevithan/xregexp which provides these features and many more like named backreferences, recursive matching, etc.

lionel-rowe • Jan 28 '21 • Edited

Nice! Named capture groups and named backreferences are now supported in vanilla JavaScript too, with great support among evergreen browsers (no IE, though).

My goal with fancy-regex is to be a really lightweight wrapper around native regexes. It will never have all the features of something like XRegExp, certainly not anything like recursive matching.

𒎏Wii 🏳️‍⚧️ • Jan 28 '21

*Laughs in LPeg*

lionel-rowe • Jan 28 '21

I guess LPeg is a Lua thing, hadn't heard of it before. The docs here look pretty gnarly, but might just be because I'm more familiar with regex syntax 😂

I like the idea of lpeg.P(string) being an exact string match though. Currently, interpolated strings are treated as regex fragments like what you'd pass to the native RegExp constructor, so regex`a${'.'}` compiles to /a./, rather than /a\./. I think I'll keep this behavior as it's more intuitive, but might be worth adding a helper function for creating exact string matches.

yvesnrb • Jan 28 '21

This rocks!
Will probably use it in the future.