DEV Community

loading...
Cover image for Introducing fancy-regex - JS/TS regexes with whitespace, comments, and interpolation!

Introducing fancy-regex - JS/TS regexes with whitespace, comments, and interpolation!

lionel-rowe
Updated on ・2 min read

Regexes in JavaScript are a fantastic tool for matching text, manipulating strings, performing validations, and a myriad of other tasks (just don’t try to use them to parse HTML!) Thanks to ever-improving Unicode support with the u flag and Unicode property escapes, regexes in Javascript have never been more powerful.

However, one area where JavaScript regexes have consistently fallen behind is developer experience. In Ruby, interpolation of variables into regexes is supported by default, and support for multiline whitespace indentation and comments can easily be enabled with the x flag. In JavaScript, meanwhile, you end up clumsily joining strings together using the RegExp constructor, using lots of double-escaped backslashes. Without comments or proper indentation, it’s no wonder that people say regexes are a write-only language.

That’s why I built fancy-regex! This tiny npm package uses the power of tagged template literals to provide a developer experience very similar to Ruby’s /#{regex}/x.

Let’s take a quick look at a couple of examples:

const myFancyRegex = regex`.{${4 + 1}}`
Enter fullscreen mode Exit fullscreen mode

Simple enough. If you don’t need to use any flags, the regex function is directly callable on template strings.

If you do need flags, you can pass them to regex first:

const myCaseInsensitiveRegex = regex('i')`
    ^
        abc

        ${myFancyRegex}  # seamlessly interpolate other regexes

        \w\d\b\0\\       # look Mom, no double escaping! 
        ... 
        \r\n\t\x20       # use "\x20" to match a literal space
    $
`
Enter fullscreen mode Exit fullscreen mode

The compiled regex here is /^abc.{5}\w\d\b\0\\...\r\n\t\x20$/i — hopefully you’ll agree that the commented and indented version is a lot more readable!

If you like, you can pass the the flags in an options object instead:

const myRegexWithOptions = regex({
    unicode: true,
    global: true,
})`
    ^
        💩+    # with unicode enabled, this matches by codepoint
    $
`
Enter fullscreen mode Exit fullscreen mode

Here, the compiled regex is /^💩+$/gu.

Because fancy-regex uses the raw template string under the hood, the only things you’ll need to escape that you wouldn’t in a regex literal are backticks (`) and the sequence ${. Whitespace and hash symbols (#) must also be escaped if you don’t want them to be removed.

On the flipside, you no longer need to escape forward-slashes, which makes URL matching even easier than before!

regex`https://dev\.to/top/(week|month)`
// compiles to /https:\/\/dev\.to\/top\/(week|month)/
Enter fullscreen mode Exit fullscreen mode

This is the first npm package I’ve ever published, so I’d love to hear your feedback on v0.X.X! 🧡🧡🧡

Discussion (5)

Collapse
learnbyexample profile image
Sundeep

See also github.com/slevithan/xregexp which provides these features and many more like named backreferences, recursive matching, etc.

Collapse
lionelrowe profile image
lionel-rowe Author • Edited

Nice! Named capture groups and named backreferences are now supported in vanilla JavaScript too, with great support among evergreen browsers (no IE, though).

My goal with fancy-regex is to be a really lightweight wrapper around native regexes. It will never have all the features of something like XRegExp, certainly not anything like recursive matching.

Collapse
yvesnrb profile image
yvesnrb

This rocks!
Will probably use it in the future.

Collapse
darkwiiplayer profile image
DarkWiiPlayer

*Laughs in LPeg*

Collapse
lionelrowe profile image
lionel-rowe Author

I guess LPeg is a Lua thing, hadn't heard of it before. The docs here look pretty gnarly, but might just be because I'm more familiar with regex syntax 😂

I like the idea of lpeg.P(string) being an exact string match though. Currently, interpolated strings are treated as regex fragments like what you'd pass to the native RegExp constructor, so regex`a${'.'}` compiles to /a./, rather than /a\./. I think I'll keep this behavior as it's more intuitive, but might be worth adding a helper function for creating exact string matches.