DEV Community

Prathamesh More
Prathamesh More

Posted on

How I can extract words from strings using regular expressions?

I have searched on the internet but I don't get any solution. I just confused.

I have this string

("type": "Person" AND ("specialFields.email": "prathameshmore@gmail.com" OR "specialFields.address": "Jaysingpur"))

How can I extract data from like this

[
  '(',
  '"type": "Person"',
  'AND',
  '(',
  '"specialFields.email": "prathameshmore@gmail"',
  'OR',
  '"specialFields.address": "Jaysingpur"',
  ')',
  ')'
]
Enter fullscreen mode Exit fullscreen mode

Please anyone can help with it.

Top comments (19)

Collapse
 
qm3ster profile image
Mihail Malo • Edited

That looks like tokenisation, except It's strange that the key-value pairs are one token.
I'd more expect a result like

[
  '(',
  '"type"',
  ':',
  '"Person"',
  'AND',
  '(',
  '"specialFields.email"',
  ':',
  '"prathameshmore@gmail"',
  'OR',
  '"specialFields.address"',
  ':',
  '"Jaysingpur"',
  ')',
  ')'
]
Enter fullscreen mode Exit fullscreen mode

However if the grammar is really simple you might just get away with parsing it in one pass to

{
  _t: "AND",
  members: [
    {_t: "EQ", k: "type", v: "Person"},
    {
      _t: "OR",
      members: [
        {_t: "EQ", k: "specialFields.email", v: "prathameshmore@gmail"},
        {_t: "EQ", k: "specialFields.address", v: "Jaysingpur"}
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

However, in any case, RegEx can only be used to parse small parts of this, if at all.
Especially, you must consider key and value strings that contain the characters [](),:, the keyords AND&OR, and escaped "\"" (or however else double quote character is escaped in your query format)

Collapse
 
pprathameshmore profile image
Prathamesh More

("\w+.?\w*":"\w+@?.?") This is working now.

How I can get the bracket and AND and `OR'?

Collapse
 
pprathameshmore profile image
Prathamesh More

I wanted to use Stack to create a mongoose filter.

Collapse
 
qm3ster profile image
Mihail Malo
  1. I updated my post, you were just too quick to reply :D
  2. What is "Stack"?
Thread Thread
 
pprathameshmore profile image
Prathamesh More • Edited

I am trying to create this output

{
 "$and": [
    { "type": "Nanoheal" },
    {
      "$or": [
        { "specialFields.address": "Jaysingpur" },
        { "specialFields.email": "prathameshmore@gmail.com" }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

It's working when I am using a string like this

 let operands = query.split("'").filter(Boolean);

//Output

 [
  '(',
  '"type": "Nanoheal"',
  'AND',
  '(',
  '"specialFields.email": "prathameshmore@nanoheal"',
  'OR',
  '"specialFields.address": "Jaysingpur"',
  ')',
  ')'
]

Enter fullscreen mode Exit fullscreen mode
const query = `'('"type": "Nanoheal"'AND'('"specialFields.email": "prathameshmore@nanoheal"'OR'"specialFields.address": "Jaysingpur"')'')'`;
Enter fullscreen mode Exit fullscreen mode

I want to make the query simple from the user side.

("type": "Person" AND ("specialFields.email": "prathameshmore@gmail.com" OR "specialFields.address": "Jaysingpur"))

Thread Thread
 
pprathameshmore profile image
Prathamesh More

The stack is used to create this object

{
 "$and": [
    { "type": "Nanoheal" },
    {
      "$or": [
        { "specialFields.address": "Jaysingpur" },
        { "specialFields.email": "prathameshmore@gmail.com" }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode
Collapse
 
pprathameshmore profile image
Prathamesh More

Any idea?

[
'(',
'"type": "Person"',
'AND',
'(',
'"specialFields.email": "prathameshmore@gmail"',
'OR',
'"specialFields.address": "Jaysingpur"',
')',
')'
]

Collapse
 
qm3ster profile image
Mihail Malo • Edited

One possible RegEx that creates these matches is

const re = /(?:[()]|AND|OR|"[^"]*":\s*"[^"]*")/g;
Enter fullscreen mode Exit fullscreen mode

try it out here

However, regex is often brittle for parsing, and produces uninformative errors (or worse, silently skips/corrupts data)

Thread Thread
 
pprathameshmore profile image
Prathamesh More

Let me check. Thank man

Thread Thread
 
pprathameshmore profile image
Prathamesh More

Thanks, man. It works. You saved my job.

Thread Thread
 
qm3ster profile image
Mihail Malo

What language are you using? Javascript?
Are you on the NodeJS runtime?
Can you use libraries from npm?

Thread Thread
 
pprathameshmore profile image
Prathamesh More • Edited

Yes, I am using JS and Node.js.

Yes, I can use lib from npm. Can you suggest to me?

I am a junior developer. Joined 1 month ago. You saved me.

Thanks, man.

Thread Thread
 
pprathameshmore profile image
Prathamesh More

Look output now

Operands [
  '(',
  '"type":"Nanoheal"',
  'AND',
  '(',
  '"specialFields.email":"prathameshmore@gmail.com"',
  'OR',
  '"specialFields.address":"Jaysingpur"',
  ')'
]
Generate [
  '"specialFields.address":"Jaysingpur"',
  'OR',
  '"specialFields.email":"prathameshmore@gmail.com"'
]
Filter {"$or":[{"specialFields.address":"Jaysingpur"},{"specialFields.email":"prathameshmore@gmail.com"}]}
Enter fullscreen mode Exit fullscreen mode
Thread Thread
 
pprathameshmore profile image
Prathamesh More

Filter is final output

Thread Thread
 
qm3ster profile image
Mihail Malo

Nah, you're good.

But for correct treatment of strings, such as if "ty\"pe": "Pe\"rson" is allowed, you should look at using a proper tokenizer, for example moo: github.com/no-context/moo#states (see how they match string escape here!)
You can then take tokens from this tokenizer yourself, or give it to for example nearley: nearley.js.org/docs/tokenizers

I suggest you read the documentation for these two libraries later today when you have time, you will then be a head above most people in tasks where custom text formats need to be parsed.

Thread Thread
 
pprathameshmore profile image
Prathamesh More

Thanks Mihail for such valuable time.

Thread Thread
 
pprathameshmore profile image
Prathamesh More • Edited

How I can modify the Regex so can support for > ,<, <=,>=, = and != .

E.g.

(("assetType": "Application" AND "assetType"> "Application" ) OR ("assetType": "AccessKey" OR "assetType": "Google"))
Enter fullscreen mode Exit fullscreen mode
Thread Thread
 
pprathameshmore profile image
Prathamesh More

This is working

(?:[()]|AND|OR|"[^"]*"\s*:*>*[<|>=|<=|!=|=]*\s*"[^"]*")
Enter fullscreen mode Exit fullscreen mode
Collapse
 
stereobooster profile image
stereobooster

Use parser, for example pegjs.org/online

Expression
  = "(" _ head:(Expression / Pair) _ ")" {
      return head;
    } /
    "(" _ head:(Expression / Pair) _ operator:("AND" / "OR") _ tail: (Expression / Pair) _ ")" {
      return { [operator]: [head, tail]};
    }

Pair
  = head:String _ ":" _ tail:String {
      return [head, tail];
    }

String "string"
  = "\"" [^\"]+ "\"" { return text(); }

_ "whitespace"
  = [ \t\n\r]*
Enter fullscreen mode Exit fullscreen mode