DEV Community

loading...

How I can extract words from strings using regular expressions?

pprathameshmore profile image Prathamesh More ・1 min read

I have searched on the internet but I don't get any solution. I just confused.

I have this string

("type": "Person" AND ("specialFields.email": "prathameshmore@gmail.com" OR "specialFields.address": "Jaysingpur"))

How can I extract data from like this

[
  '(',
  '"type": "Person"',
  'AND',
  '(',
  '"specialFields.email": "prathameshmore@gmail"',
  'OR',
  '"specialFields.address": "Jaysingpur"',
  ')',
  ')'
]
Enter fullscreen mode Exit fullscreen mode

Please anyone can help with it.

Discussion (20)

pic
Editor guide
Collapse
qm3ster profile image
Mihail Malo • Edited

That looks like tokenisation, except It's strange that the key-value pairs are one token.
I'd more expect a result like

[
  '(',
  '"type"',
  ':',
  '"Person"',
  'AND',
  '(',
  '"specialFields.email"',
  ':',
  '"prathameshmore@gmail"',
  'OR',
  '"specialFields.address"',
  ':',
  '"Jaysingpur"',
  ')',
  ')'
]
Enter fullscreen mode Exit fullscreen mode

However if the grammar is really simple you might just get away with parsing it in one pass to

{
  _t: "AND",
  members: [
    {_t: "EQ", k: "type", v: "Person"},
    {
      _t: "OR",
      members: [
        {_t: "EQ", k: "specialFields.email", v: "prathameshmore@gmail"},
        {_t: "EQ", k: "specialFields.address", v: "Jaysingpur"}
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

However, in any case, RegEx can only be used to parse small parts of this, if at all.
Especially, you must consider key and value strings that contain the characters [](),:, the keyords AND&OR, and escaped "\"" (or however else double quote character is escaped in your query format)

Collapse
pprathameshmore profile image
Prathamesh More Author

("\w+.?\w*":"\w+@?.?") This is working now.

How I can get the bracket and AND and `OR'?

Collapse
pprathameshmore profile image
Prathamesh More Author

I wanted to use Stack to create a mongoose filter.

Collapse
qm3ster profile image
Mihail Malo
  1. I updated my post, you were just too quick to reply :D
  2. What is "Stack"?
Thread Thread
pprathameshmore profile image
Prathamesh More Author • Edited

I am trying to create this output

{
 "$and": [
    { "type": "Nanoheal" },
    {
      "$or": [
        { "specialFields.address": "Jaysingpur" },
        { "specialFields.email": "prathameshmore@gmail.com" }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

It's working when I am using a string like this

 let operands = query.split("'").filter(Boolean);

//Output

 [
  '(',
  '"type": "Nanoheal"',
  'AND',
  '(',
  '"specialFields.email": "prathameshmore@nanoheal"',
  'OR',
  '"specialFields.address": "Jaysingpur"',
  ')',
  ')'
]

Enter fullscreen mode Exit fullscreen mode
const query = `'('"type": "Nanoheal"'AND'('"specialFields.email": "prathameshmore@nanoheal"'OR'"specialFields.address": "Jaysingpur"')'')'`;
Enter fullscreen mode Exit fullscreen mode

I want to make the query simple from the user side.

("type": "Person" AND ("specialFields.email": "prathameshmore@gmail.com" OR "specialFields.address": "Jaysingpur"))

Thread Thread
pprathameshmore profile image
Prathamesh More Author

The stack is used to create this object

{
 "$and": [
    { "type": "Nanoheal" },
    {
      "$or": [
        { "specialFields.address": "Jaysingpur" },
        { "specialFields.email": "prathameshmore@gmail.com" }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode
Collapse
pprathameshmore profile image
Prathamesh More Author

Any idea?

[
'(',
'"type": "Person"',
'AND',
'(',
'"specialFields.email": "prathameshmore@gmail"',
'OR',
'"specialFields.address": "Jaysingpur"',
')',
')'
]

Collapse
qm3ster profile image
Mihail Malo • Edited

One possible RegEx that creates these matches is

const re = /(?:[()]|AND|OR|"[^"]*":\s*"[^"]*")/g;
Enter fullscreen mode Exit fullscreen mode

try it out here

However, regex is often brittle for parsing, and produces uninformative errors (or worse, silently skips/corrupts data)

Thread Thread
pprathameshmore profile image
Prathamesh More Author

Let me check. Thank man

Thread Thread
pprathameshmore profile image
Prathamesh More Author

Thanks, man. It works. You saved my job.

Thread Thread
qm3ster profile image
Mihail Malo

What language are you using? Javascript?
Are you on the NodeJS runtime?
Can you use libraries from npm?

Thread Thread
pprathameshmore profile image
Prathamesh More Author • Edited

Yes, I am using JS and Node.js.

Yes, I can use lib from npm. Can you suggest to me?

I am a junior developer. Joined 1 month ago. You saved me.

Thanks, man.

Thread Thread
pprathameshmore profile image
Prathamesh More Author

Look output now

Operands [
  '(',
  '"type":"Nanoheal"',
  'AND',
  '(',
  '"specialFields.email":"prathameshmore@gmail.com"',
  'OR',
  '"specialFields.address":"Jaysingpur"',
  ')'
]
Generate [
  '"specialFields.address":"Jaysingpur"',
  'OR',
  '"specialFields.email":"prathameshmore@gmail.com"'
]
Filter {"$or":[{"specialFields.address":"Jaysingpur"},{"specialFields.email":"prathameshmore@gmail.com"}]}
Enter fullscreen mode Exit fullscreen mode
Thread Thread
pprathameshmore profile image
Prathamesh More Author

Filter is final output

Thread Thread
qm3ster profile image
Mihail Malo

Nah, you're good.

But for correct treatment of strings, such as if "ty\"pe": "Pe\"rson" is allowed, you should look at using a proper tokenizer, for example moo: github.com/no-context/moo#states (see how they match string escape here!)
You can then take tokens from this tokenizer yourself, or give it to for example nearley: nearley.js.org/docs/tokenizers

I suggest you read the documentation for these two libraries later today when you have time, you will then be a head above most people in tasks where custom text formats need to be parsed.

Thread Thread
pprathameshmore profile image
Prathamesh More Author

Thanks Mihail for such valuable time.

Thread Thread
pprathameshmore profile image
Prathamesh More Author • Edited

How I can modify the Regex so can support for > ,<, <=,>=, = and != .

E.g.

(("assetType": "Application" AND "assetType"> "Application" ) OR ("assetType": "AccessKey" OR "assetType": "Google"))
Enter fullscreen mode Exit fullscreen mode
Thread Thread
pprathameshmore profile image
Prathamesh More Author

This is working

(?:[()]|AND|OR|"[^"]*"\s*:*>*[<|>=|<=|!=|=]*\s*"[^"]*")
Enter fullscreen mode Exit fullscreen mode
Collapse
stereobooster profile image
stereobooster

Use parser, for example pegjs.org/online

Expression
  = "(" _ head:(Expression / Pair) _ ")" {
      return head;
    } /
    "(" _ head:(Expression / Pair) _ operator:("AND" / "OR") _ tail: (Expression / Pair) _ ")" {
      return { [operator]: [head, tail]};
    }

Pair
  = head:String _ ":" _ tail:String {
      return [head, tail];
    }

String "string"
  = "\"" [^\"]+ "\"" { return text(); }

_ "whitespace"
  = [ \t\n\r]*
Enter fullscreen mode Exit fullscreen mode