DEV Community

dev.to staff
dev.to staff

Posted on

Daily Challenge #232 - Regex Pattern

Implement a function that should return a valid regular expression. This is a pattern which is normally used to match parts of a string. In this case, it'll be used to check if all the characters given in the input appear in a string.

Your input will be a string of upper and lower case alphabetical characters. You should output a regex pattern as a string.

Example

abc = 'abc'
pattern = regex_contains_all(abc)
st = 'zzzaaacccbbbzzz'
bool(re.match(pattern, st), f"Testing if {st} contains all characters in {abc} with your pattern {pattern}")

will return True

Tests

abc = 'abc'
pattern = regex_contains_all(abc)
bool(re.match(pattern, 'bca'))
bool(re.match(pattern, 'baczzz'))
bool(re.match(pattern, 'ac'))
bool(re.match(pattern, 'cb'))

Good luck!


This challenge comes from albertogcmr on CodeWars. Thank you to CodeWars, who has licensed redistribution of this challenge under the 2-Clause BSD License!

Want to propose a challenge idea for a future post? Email yo+challenge@dev.to with your suggestions!

Top comments (11)

Collapse
 
qm3ster profile image
Mihail Malo • Edited

// Javscript

const regex_contains_all = chars =>
  new RegExp(
    chars
      .split("")
      .map(c => `(?=.*${c})`)
      .join("")
  )

// Test

const abc = 'abc'
const re = regex_contains_all(abc)
;[
  re.test('bca'),
  re.test('baczzz'),
  re.test('ac'),
  re.test('cb')
]

This uses "positive lookahead" patterns to match multiple patterns to the same spans.
This allows me to avoid listing all the permutations.
That is not to say that the runtime will be anything but atrocious.

Truly a oneliner

const regex_contains_all = chars => new RegExp(chars.replace(/./g,'(?=.*$&)'))
Collapse
 
kenbellows profile image
Ken Bellows • Edited

Winner right here. Lookaheads are the one regex feature I have not spent nearly enough time investigating. From what I've seen they dramatically broaden the scope of what you can do with regexes

Collapse
 
craigmc08 profile image
Craig McIlwrath

I was interested in how the positive lookahead compares to a list of all permutations. I did some basic benchmarking and determined that your solution gives what appears to be logarithmic time, which seems strange. This could be an error of my testing. Whatever is the case with that, listing all permutation is certainly much, much slower. Here's a chart: (y axis is logarithmic scale and in nanoseconds):

graph

The data for "All Permutations" isn't complete. 7 characters is all the node.js regex engine can handle (any longer and it the regex is too large). I only did 6 because I was getting impatient waiting.

Collapse
 
qm3ster profile image
Mihail Malo

Wow!
How long are the strings you test against?
I'd be interested in ridiculously long strings which include a lot of some of the letters, but not others, and how finally including one of them at the beginning affects the runtime.

Thread Thread
 
craigmc08 profile image
Craig McIlwrath

Each test was on 100k strings of length [30..39] composed of random lowercase letters. The patterns were slices of increasing length from the beginning of the alphabet ('a', 'ab', 'abc', etc.). The same test cases were used for all data points.

If I have some time tomorrow, I may do some more tests with longer strings/placement of the required letters. I'll update you!

Maybe I'll add in a regex-less implementation for comparison too.

Collapse
 
quoll profile image
Paula Gearon

I don't consider myself a regex expert, but this is a feature I've never used and had forgotten about. Very nice, thank you!

Collapse
 
craigmc08 profile image
Craig McIlwrath

I'm not sure why you would want to implement this as a regular expression. There's no compact way (that I could think of) to represent those rules as a regular language. Here's my solution, in javascript.

function permutations(list) {
  if (list.length === 1) return [list];
  const xs = [];
  for (let i = 0; i < list.length; i++) {
    const tails  = permutations(list.filter((y, j) => j !== i));
    xs.push(...tails.map(ys => [list[i], ...ys]));
  }
  return xs;
}

function regexContainsAll(chars) {
  const regex = permutations(chars.split(''))
    .map(pattern => `^.*${pattern.join('.*')}.*$`)
    .join('|');
  return regex;
}

For example, input 'abc' produces the string 'a.*b.*c|a.*c.*b|b.*a.*c|b.*c.*a|c.*a.*b|c.*b.*a'.

As a side note, why do all of these challenges have "tests" that don't include expected results? It doesn't help me understand the question if they don't tell me what the function is supposed to do...

Collapse
 
fluffynuts profile image
Davyd McColl
function regex_contains_all(chars, caseSensitive) {
  return new RegExp(`^[${chars}]*$`, caseSensitive ? undefined : "i");
}
  • it's case-insensitive by default -- the question isn't clear about case sensitivity
  • the stated tests look like they are meant for Python, so here's a self-contained test suite:
function assert(condition) {
    if (!condition) {
        throw new Error("FAIL");
    }
}

var abc = 'abc'
var pattern = regex_contains_all(abc)
assert("bca".match(pattern) !== null);
assert("baczzz".match(pattern) === null);
assert("ac".match(pattern) !== null);
assert("cb".match(pattern) !== null);
Collapse
 
qm3ster profile image
Mihail Malo

You did "string consists only of", not "string includes all of".
The intended result matrix is:

assert(pattern.test("zbca"))
assert(pattern.test("zbaczzz"))
assert(!pattern.test("zac"))
assert(!pattern.test("zcb"))
Collapse
 
quoll profile image
Paula Gearon

Can I just check...
This looks for every character in the requirement string must be there, and it doesn't matter which order, right? So my understanding is that the tests should give:

abc = 'abc'
pattern = regex_contains_all(abc)

bool(re.match(pattern, 'bca')) => true
bool(re.match(pattern, 'baczzz')) => true
bool(re.match(pattern, 'ac')) => false
bool(re.match(pattern, 'cb')) => false

Collapse
 
quoll profile image
Paula Gearon

Assuming the above is OK, then here's my Clojure solution:

(require '[clojure.string :as s])
(defn permute [a]
  (or (seq (mapcat #(map (partial cons %) (permute (remove #{%} a))) a)) [[]]))

(defn regex-contains-all [input]
  (str "(" (s/join ")|(" (map (partial s/join ".*") (permute input))) ")"))

Apologies for writing code that looks like line noise.

Given an input of "abc" then it will generate:

(regex-contains-all "abc")
"(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)"

This gives the test results that I describe in the previous comment.