DEV Community

loading...

Asking for review on non-string regular expressions

xowap profile image Rémy 🤖 ・1 min read

There is this idea I have and that I want to push forward about "non-string regular expressions". I explained things as I could on the repo.

GitHub logo Xowap / nsre

Non-String Regular Expressions

Non-String Regular Expressions

Build Status

NSRE (Non-String Regular Expressions) is a new spin at regular expressions It's really abstract, even compared to regular expressions as you know them but it's also pretty powerful for some uses.

Here's the twist: what if regular expressions could, instead of matching just character strings, match any sequence of anything?

from nsre import *
re = RegExp.from_ast(seq('hello, ') + (seq('foo') | seq('bar')))
assert re.match('hello, foo')

The main goal here is matching NLU grammars when there is several possible interpretations of a single word, however there is a lot of other things that you could do. You just need to understand what NSRE is and apply it to something.

Note — This is inspired by this article from Russ Cox which explains how Thompson NFA work, except that I…

I'm looking for all kinds of feedbacks

  • Do you understand what this is?
  • Do you see applications for this?
  • Does the API look nice?
  • What features would you want to see around that?
  • What would you want before using this in production?

Thanks!

Discussion (3)

Collapse
rootfsext2gz profile image
rootfs.ext2.gz

I had a brief look at it, and whilst I'm predominantly a Java developer, I have done some Python scripting in the past so I can look at the code and feel comfortable with it.

I'll try and answer your questions:

Do I understand what this is?

I think so - I believe it's a way to search for a value in complicated objects, and those values and/or objects may or may not be strings.

It feels like it is making Regex more human readable.

Do I see applications for this?

Kind of - I saw right at the bottom that the performance for this was quoted as being "terrible" so it's not something I'd happy use in a production environment, but I can imagine it would be super excellent for searching for a piece of data in a complex JSON or XML object, and that would be super dandy if I say so myself - but only if the performance is decent.

Does the API look nice?

This is where me being a prominent Java developer will probably fail me. To me, it actually reminds me a lot of old-school Java, in that it's very verbose to express something simple. What I would imagine would be nice would be something like

re.on(datatype).match(expression) and have a limited set of expressions be available for the datatype. But I expect that would be more lengthier to code and maintaining a codebase like that would be hell.

But then again I'm not an expert in Python

What features would I want to see around that?

Mainly efficient, easy-to-understand regex formatting with various data types, like JSON, XML, CSS, perhaps even just a massive String which represents a text file.

What would I want before using this in production?

Mostly speed to be honest, and possibly support for the above data types? But that might be a stretch. The API is a nice to have but I rather not enforce that on any Python developer as someone who is interested in the code but isn't an expert on the language.


Honestly I think what you made is cool, even if is in another language that I don't use! 😂 Oh well. It's a good start and I think it definitely has promise!

Keep up the good work! 👍

Collapse
anpos231 profile image
anpos231

Maybe I fail to understand, but how is this:

from nsre import *

re = AnyNumber(
    Symbol(KeyHasValue("type", "image")) + Maybe(KeyHasValue("type", "caption"))
) + Range(KeyHasValue("type", "text"), min=1)

assert re.match(
    [
        {"type": "image", "url": "https://img1.jpg"},
        {"type": "image", "url": "https://img2.jpg"},
        {"type": "image", "url": "https://img3.jpg"},
        {"type": "caption", "text": "Image 3"},
        {"type": "image", "url": "https://img4.jpg"},
        {"type": "caption", "text": "Image 4"},
        {"type": "image", "url": "https://img5.jpg"},
        {"type": "text", "text": "Hello"},
        {"type": "text", "text": "Foo"},
        {"type": "text", "text": "Bar"},
    ]
)

Better than this:

[
  {"type": "image", "url": "https://img1.jpg"},
  {"type": "image", "url": "https://img2.jpg"},
  {"type": "image", "url": "https://img3.jpg"},
  {"type": "caption", "text": "Image 3"},
  {"type": "image", "url": "https://img4.jpg"},
  {"type": "caption", "text": "Image 4"},
  {"type": "image", "url": "https://img5.jpg"},
  {"type": "text", "text": "Hello"},
  {"type": "text", "text": "Foo"},
  {"type": "text", "text": "Bar"},
]
  .filter(x => (x.type === "image") || (x.type === "caption"))
  .filter(x => x.text)
  .map(x => x.text.length)
Collapse
megazear7 profile image
megazear7

The power of regular expressions in my opinion is to be able to encapsulate complex logic in a few characters with a standardized syntax. Being able to do the same type of thing on an array of JavaScript objects might open up some options, but the stream operations of map, reduce, etc are already powerful and flexible so this would need to provide something either different, cleaner, simpler, or more concise. However I do understand what it is doing and sometimes it's good to build tools and then see what unforseen things they can do after you have them to mess around with.

Forem Open with the Forem app