DEV Community

Cover image for Ruby Regex
benjaminolmsted
benjaminolmsted

Posted on

Ruby Regex

I sort of blackout when I get hit with a regex string. this seems suboptimal. This is me learning about Regex. The Rubular tool is indispensable https://rubular.com/

Regular Expressions are used to match patterns in text. Why might we want to do this?
Data Validation - is a string a valid email address or phone-number?

(?:[a-z0-9!#$%&'+/=?^_{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:a-z0-9?\.)+a-z0-9?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-][a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

(Well, actually, email validation should be done through a confirmation link with a token.)

Web scrapping

Any text parsing needs like syntax highlighting, data wrangling, or search

Constructing Regular Expressions

They can be made several ways. With literals:

/ pattern / or %r{pattern}

They can also be constructed using [Regex.new](http://regex.new)with a string and Regex.union with a list of strings or an array of strings.

As for the contents of the pattern, the following table is an excellent starting place. I couldn't make it cleaner, so here it is in it's entirety.

image

Using Regular Expressions

=~ is Ruby's basic pattern-matching operator.

It will return the first index of a match in the string, nil if no match is present.

#match will return a MatchData object

In the String class, there are several methods that make use of regular expressions

#gsub can be used to substitute a string for all occurrences given regular expression.

#sub substitutes the first occurrence of the supplied regex

#partition splits the string into three: the part before the match, the match, and the part after the match

#scan produces an array of the matches or passes them to a block

#split splits a string on the given pattern

What can we put in the pattern?

Characters!

Individual characters represent themselves literally

/abc/ will match 'a' followed by 'b' followed by 'c', exactly, but not something like "abdc"

/[abc]/ will match a or b or c

image

Here we match one of [aeiou] followed by a single t

like all meta-characters, if we want to use the brackets in our literal pattern, we have to escape them with back-slashes [.

Alt Text
here we match the footnote markers from wikipedia by using the special character \d, which matches any digit

In addition to literal characters, there are many special characters we can use to build our patterns.

/\w/ - A word character ([a-zA-Z0-9_])

/\W/ - A non-word character ([^a-zA-Z0-9_])

/\d/ - A digit character ([0-9])

/\D/ - A non-digit character ([^0-9])

/\h/ - A hexdigit character ([0-9a-fA-F])

/\H/ - A non-hexdigit character ([^0-9a-fA-F])

/\s/ - A whitespace character: /[ \t\r\n\f\v]/

/\S/ - A non-whitespace character: /[^ \t\r\n\f\v]/

A nice convention we see here is that by upcasing the letter, we get the complimentary set of characters.

image

We can also specify the number of times that a character occurs using quantifiers.

* - Zero or more times

+ - One or more times

? - Zero or one times (optional)

{n} - Exactly n times

{n,} - n or more times

{,m} - m or less times

{n, m} - At least n and at most m times

image

Here we find all double vowels.

Parentheses can be used to create capture groups, which allow us to access parts of the match later in the pattern, and also after we match, by using the match variables $1, $2, $3...

There is a lot to parentheses in regex. I'll leave it here for now.

References

https://en.wikipedia.org/wiki/Regular_expression

https://ruby-doc.org/core-2.4.1/Regexp.html

https://ruby-doc.org/core-2.4.0/MatchData.html

Top comments (0)