DEV Community

Cover image for Finally learning Regex
ibrahim ali
ibrahim ali

Posted on

Finally learning Regex

On the first day of the Bootcamp phase of operation phase, one of our first practice problems was to transform a given string into a dash case. While the solution was a simple .split.join chain, I went online and found the .replace method which led me to Regular Expression, shortened to regex. I ended up watching a 45-minute video on regex and at the time, having only the most basic knowledge of javascript, came out way more confused than I went in. Since then, anytime I've attempted to do research on a problem that may require complex string manipulation if the solution requires regex I've always opted out, instead preferring the previously stated .split.join. or using char chars or literally anything else but the dreaded regex. But now being 13 weeks into my Operation Spark journey, I've decided to finally tackle my regex anxieties and add another skill to my programming repertoire.

"A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that specifies a search pattern. Usually, such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation." -Wikipedia.

Alt Text

Regex originates from the 1950s when mathematician Stephen Cole Kleene came up with regular language which is a formal language that a subset of rules can define. This allowed computer science theorists to incorporate it into code, and use it for early text editors and compilers. Many decades later these rules still apply and are incorporated into most programming languages and run behind the scenes of search engines, document editors, and many other such applications.

In this post, I'll give a rundown of basic regex syntax and special characters. Here is a nifty website that can help practice regex expressions in real-time.

https://regexr.com/

The most basic syntax for regex is the forward slashes. All of the code to-be executed goes inside of them. Anything typed in between these forward slashes is what the regex is going to be searching for when it is executed.

/regex/ //regex
Enter fullscreen mode Exit fullscreen mode

Well that is except for the expression flags that go right after the second forward slash.

//the global flag
/the/g
Enter fullscreen mode Exit fullscreen mode

Alt Text

The global flag, represented by g, applies the regex to everything inside of the specified string, else it would only apply to the first character.

Alt Text

The lowercase i denotes case sensitivity.

The lowercase m is for multiline and the lowercase s is for single line

/the/gi //will account for case sensitivity

/the/gs //gm //will account for multiline string]
Enter fullscreen mode Exit fullscreen mode

There are couple of other flags but these are the most basic and frequently used ones.

Then there are special characters. These are really the bread and butter of regex. Each one does a separate unique thing that when chained together can become really powerful tools in string manipulation.

Alt Text

The plus operator, +, is used to check for more than one character in a row. For instance, if I wanted to search for any word that may have two e's in a row such as "street" adding the plus operator in front of the "e" will check for that.

Alt Text

The optional, ?, represented by the question mark, will optionally look for the character that is placed before it. Here the optional is placed right before the "w" so it will look for the "o" and optionally look for the "w" if it's there.

Alt Text

The star operator, , will match any amount of similar characters in a row. Here the "re" is going to search for any amount of "e"s that go after "r".

Alt Text

The period character, ., will match anything that it's placed for, either before or after depending on where you place it. Here the period after "o" will match with any two letters that start off with "o".

Alt Text

The \w is to match word characters. Anything written as word will be searching here. Conversely, the \s will match all whitespace. Also doing the capital version of both letters switches around what the expressions do and will highlight spaces and words, instead respectively.

Alt Text

Inside of the curly braces, you can indicate how many characters you would like your search to range over. Here we have the capital \S which is searching for words and spaces and the 2 and 3 inside of the curly braces is denoting any words between 2 and 3 characters.

Alt Text

The square brackets take any characters you wanna match with whatever next to them. I want to check for anything that has an "o" with either a "g" or "p" after it. This line accomplishes that.

Alt Text

I can also use the dash, -, to check for a range of characters. So here I can check for anything that has the letter "o" followed by any other letter between "f" and "r".

Alt Text

The capture, represented by the parenthesis will capture groups of characters to implement the regex search on. For example, if I wanted to look for an "o" followed by either an "s" or a "t". I will wrap the "s" and the "t" with the capture and then place a column line to denote "or" in between.

So while all of these can be chained together to allow for many dynamic uses. These are the most basic regex special characters that will allow you to do most of what you want to accomplish with regex.

sources:
https://en.wikipedia.org/wiki/Regular_expression

https://www.youtube.com/watch?v=rhzKDrUiJVk&t=896s

Top comments (0)