bob.ts

Posted on Aug 5, 2019 • Edited on Nov 16, 2020

Regular Expressions And Template Literals

#javascript #regex #templateliteral #webdev

Setup

Somewhere along the line, I heard a comment about template literals being a great tool for making regular expressions a bit easier to read. I started this article with the idea that I wanted to see if that was true and come up with some examples of this type of use.

Given the glimmer of an idea, I started a new project. This is an exercise ... plain and simple. This pattern "could" be used in a production environment, but I am in now way recommending that.

There are probably some vetted tools out there that can do this for the front-end. Please list some of these in the comments, if you know of them; if only for the sake of my readers.

Previous Work With Regular Expressions

Having worked on a project for a client where I had to recreate a script parser and engine for a 30-year old, mainframe driven client language, I had a lot of respect for Regular Expressions. I learned a lot (translate that into ... a lot of poor code was written and refactored). After two major refactors, I had a working set of code ... and HUNDREDS of Regular Expressions to make things work.

I used every trick I knew to make the Parser Regular Expression Service more readable. I abstracted and combined together all sorts of interesting patterns, knowing that someday this code would be managed by someone else.

Having struggled with this, using Template Literals this way sounded very efficient and clean. Certainly, something that deserved some research.

What I Want To Do ...

First, I found a regular expression; something like this. I want to take this ...

Matches text avoiding additional spaces

// ^[\s]*(.*?)[\s]*$

And, generate it from something more legible, like this ...

const code0001 = `
  /* Matches text avoiding additional spaces
  */
  ^       // Beginning of line
  [\\s]*  // Zero or more whitespace
  (.*?)   // Any characters, zero to unlimited,
          //   lazy (as few times as possible, expanding as needed)
  [\\s]*  // Zero or more whitespace
  $       // End of line
`;

NOTE here that the \s still needs to be escaped ... seems odd, but there it is.

Beginning

First, I needed to get rid of comments ...

// Borrowed Function (stripComment uses the regex
// ... https://stackoverflow.com/a/47312708)
function stripComments(stringLiteral) {
  return stringLiteral
    .replace(/\/\*[\s\S]*?\*\/|([^:]|^)\/\/.*$/gm, '');
}

The code above took the code and essentially translated it into ...

"

  ^    
  [\s]*
  (.*?)
  [\s]*
  $    
"

Basically, now I need to get rid of line breaks, new lines, and spaces (yes, I know there can be a space in a regex pattern, but I'm choosing to ignore that for simplicity sake in this exercise). To remove unneeded characters ...

// Starting Demo Code Here
function createRegex(stringLiteral) {
  return stripComments(stringLiteral)
    .replace(/(\r\n|r\|\n|\s)/gm, '');
}

Which then gives me the ability to do this ...

const code0001regex = new RegExp(createRegex(code0001));

//          ORIGINAL FROM ABOVE: /^[\s]*(.*?)[\s]*$/
// GENERATED code001regex value: /^[\s]*(.*?)[\s]*$/

Let's Take A Look ...

The code0001 I defined above has been reworked for legibility (now much easier to hone in on what this regex pattern is going to do) ...

// /^[\s]*(.*?)[\s]*$/
const code0001 = `
  ^       // Beginning of line
  [\\s]*  // Zero or more whitespace

  (.*?)   // Any characters, zero to unlimited,
          //  lazy (as few times as possible, expanding as needed)

  [\\s]*  // Zero or more whitespace
  $       // End of line
`;

code0002
Matches any valid HTML tag and the corresponding closing tag ... here, I've tried to show a bit more advanced indenting (both in the code and in the supporting comments).

// <([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)
const code0002 = `
  <               // Literal
  ([a-z]+)        // Group: First Tag (one or more)
  (               // Group
    [^<]+           // Match (one or more) NOT <
  )*              // Group-END: Zero or more times
  (?:             // Group-NON-CAPTURE
    >               // Literal
    (.*)<\\/\\1>    // Up to and including SLASH and First Tag group above
    |\\s+\\/>       // OR spaces and close tag
  )               // Group-END
`;

code0003
Matches any valid hex color inside text.

// \B#(?:[a-fA-F0–9]{6}|[a-fA-F0–9]{3})\b
const code0003 = `
  \\B#              // Non-word boundary, Literal #
  (?:               // Group-NON-CAPTURE
    [a-fA-F0–9]{6}    // 1st alternative
    |[a-fA-F0–9]{3}   // 2nd alternative
  )                 // Group-END
  \\b               // Word boundary
`;

code0004
Matches any valid email inside text.

// \b[\w.!#$%&’*+\/=?^`{|}~-]+@[\w-]+(?:\.[\w-]+)*\b
const code0004 = `
  \\b                           // Word boundary
  [\\w.!#$%&’*+\\/=?^\`{|}~-]+  // Character in this list (and word), one to unlimited
  @                             // Literal
  [\\w-]+                       // One to unlimited word and character "-"
  (?:                           // Group-NON-CAPTURE
    \\.[\\w-]+                    // Literal ".", one to unlimited word and character "-"
  )*                            // Group-END (zero or more)
  \\b                           // Word boundary
`;

code0005
Strong password: Minimum length of 6, at least one uppercase letter, at least one lowercase letter, at least one number, at least one special character.

// (?=^.{6,}$)((?=.*\w)(?=.*[A-Z])(?=.*[a-z])
// ... (?=.*[0-9])(?=.*[|!"$%&\/\(\)\?\^\'\\\+\-\*]))^.*
const code0005 = `
  (?=           // Group-POSITIVE-LOOKAHEAD
    ^             // BOL
    .{6,}         // Any six characters except line terminators
    $             // EOL
  )             // Group-POSITIVE-LOOKAHEAD-END
  (             // Group
    (?=.*\\w)     // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Word

    (?=.*[A-Z])   // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character (A-Z)

    (?=.*[a-z])   // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character (a-z)

    (?=.*[0-9])   // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character (0-9)

    (?=.*[|!"$%&\\/\\(\\)\\?\\^\\'\\\\\\+\\-\\*])
                  // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character in the list
  )             // Group-END
  ^             // BOL
  .*            // Match Any Characters, zero to unlimited
`;

code0006
SSN — Social Security Number (simple)

// ^((?<area>[\d]{3})[-][\d]{2}[-][\d]{4})$
const code0006 = `
  ^                   // BOL
  (                   // Group
    (?<area>            // Group-NAMED area
      [\\d]{3}            // 3-Digits
    )                   // Group-NAMED-END
    [-]                 // Literal, Dash
    [\\d]{2}            //  2-Digits
    [-]                 // Literal, Dash
    [\\d]{4}            // 4-Digits
  )                   // Group-END
  $                   // EOL
`;

Conclusions

This whole article is a different take on generating Regular Expressions using some of JavaScript's template literals. This was an experiment. A successful one I believe.

This exercise also points out that writing tests against the regex can become much easier as the pattern becomes more understandable.

The regex generated here is much easier to read and reason about, which was the goal. This is a pattern I could get behind if there was a need for a number of regex templates within a project.

Top comments (6)

srobfr • Aug 6 '19

Nice trick.
PHP has a dedicated modifier for this kind of regex usage. See php.net/manual/en/reference.pcre.p...

Just a remark though : trying to parse HTML tags using regex is a bad idea (mandatory reading : stackoverflow.com/questions/173234...).

bob.ts • Aug 6 '19

I know the idea is bad; as I said, this is a code example (simply research) and I needed something to work with).

Thanks for the comments!

Rémy 🤖 • Aug 6 '19

That's an interesting take, although I wonder how you could make comments more constructive. Right now you're simply describing things out loud, maybe there is a smarter story to tell?

Also I would be interested to get your opinion on this other take that I'm currently working on. Could also lead to something going in your direction.

Xowap / nsre

Non-String Regular Expressions

NSRE (Non-String Regular Expressions) is a new spin at regular expressions It's really abstract, even compared to regular expressions as you know them but it's also pretty powerful for some uses.

Here's the twist: what if regular expressions could, instead of matching just character strings, match any sequence of anything?

from nsre import *

re = RegExp.from_ast(seq('hello, ') + (seq('foo') | seq('bar')))
assert re.match('hello, foo')

The main goal here is matching NLU grammars when there is several possible interpretations of a single word, however there is a lot of other things that you could do. You just need to understand what NSRE is and apply it to something.

Note — This is inspired by this article from Russ Cox which explains how Thompson NFA work, except that I…

View on GitHub

bob.ts • Aug 6 '19

Thanks for the comments.

I'll take a second look at my comments in the code. This was simply an experiment within JavaScript ... I'm not sure I'd be much help with your project. In fact, I'm pretty sure your project is out of my league.

Rémy 🤖 • Aug 6 '19

I think you are undervaluating what you did here, regex documentation and comprehension is a major issue in everyday developer life. That's a great insight you had and I think you can push this much further :)

Chad Windham • Aug 5 '19

WOW this was really cool! Thanks so much for sharing, I'd never heard/thought of using template literals for regex but getting to see it laid out like that makes a lot of sense!