If you are like me and possess an incomplete understanding of regular expressions (regex), then trying to read the documentation on string manipulation might have left you pulling your hair out. It often feels like these resources are written entirely in regex, and if you left you Rosetta Stone in your other wallet, you may find yourself befuddled. This post will dive mainly into Ruby's gsub
method. However, because gsub
gets a lot of power from regex, this post will use regex and I will explain it when used. For more detailed regex information, check out these great resources:
gsub... What?
Okay, what is gsub
? It's a string manipulation method in Ruby; the g
stands for global and the sub
stands for substitution. Put simply, gsub
looks through a string for a pattern and replaces it with another pattern, and then returns the modified string. It should be noted that Ruby has a sub
method that performs the same task on only the first pattern that matches the search pattern. Both gsub
and sub
are non-destructive; to modify the original, string ruby offers gsub!
and sub!
. gsub
operates as a method with 2 arguments or as an enumerable:
string.gsub(search_pattern, replacement)
# or
string.gsub(search_pattern) do |sp|
#do something fancy with each found search_pattern
end
For both arguments gsub
accepts strings (stuff in-between ' '
or " "
) or regex (for Ruby purposes regex need always be contained between / /
).
gsub... Why?
Why bother with gsub
? Well, because the world is messy and, by extension, so is data. Let's say you're collecting phone numbers from a database that didn't properly validate their data format and every so often you get 123.456.7899 or (123)456-7899 instead of 1234567899. Wouldn't it be nice to let ruby find and fix this for you rather than waiting for your program to crash and then manually re-formatting? I for one, say yes!
gsub... How?
Alright, now we're down to the good stuff. How do I make Ruby do all of my string related bidding? First the case of a simple string substitution.
"cheese".gsub('e', '3')
# => "ch33s3"
"cheese".gsub(/[^e]/, '3')
# => "33ee3e"
The regex [^e]
simply means "all the things that aren't e". With this first use case, we already have the ability to fix our phone number format.
phone_number = '(123)456-7899'
phone_number.gsub(/[()-.]/,'')
# => "1234567899"
This bit: [()-.]
is looking for any of the symbols within the brackets, which gsub
then replaces with nothing!
Let's say you have some string data that contains prices, but you are expanding your market to Japan. Whatever will you do? gsub
to the rescue!
data = 'Plane tickets are $200'
conversion = 104.72 # yen/dollar according to google at time of writing
data.gsub(/\d+/) { |char| char.to_f * conversion }
# => "Plane tickets are $20944.0"
You're probably thinking, "okay the number is correct but it still says dollars so... not super helpful". That's fair, if only there was some way to swap one character for another, hmm... gsub
to the rescue again! We can take advantage of method chaining.
data.gsub(/\d+/) { |char| (char.to_f * conversion) }.gsub('$','¥')
# => "Plane tickets are ¥20944.0"
Ruby is looking for all collections of one or more digits (\d+
) and converting each of those to a float, then multiplying them by our conversion faction. After it has done this with all digits, the resulting string gets passed to another gsub
and has all $
replaced with ¥
.
This is all well and good, but by now you must be wondering, "will gsub
help me pretend to be a spy?" Yes... Yes, it can!
As a second argument gsub
can also take a hash. If any of the matched patterns from the first argument exist as keys of the hash they will be swapped out for the corresponding value. It's like a super-secret decoder ring!
substitutions = {
'a' => '@',
'e' => '3',
'i' => '!',
}
phrase = 'I am hiding and I have come for your cheese'
phrase.gsub(/[aei]/, substitutions)
# => "I @m h!d!ng @nd I h@v3 com3 for your ch33s3"
The regex here, [aei]
is just matching each of those characters individually. Notice that the capital I's were unchanged, both because we were only searching for lower case letters but also because our substitutions hash doesn't have an I
key even if it were included in our search.
One more gsub
use case to explore before we part ways. By putting search terms in parenthesis we can group them, then reference those groups to perform specific manipulations. The grouped patterns must be found adjacent to each other, otherwise gsub
will just return the original string.
phrase = "cheese"
phrase.gsub(/(h)([e]+)/, '{\1}<\2>')
# => "c{h}<ee>se"
In our search pattern, we look first for h
then more than one e [e]+
we can then reference these groups by the order '\d'
where d is the number of the group from the left. Group numbers can be a bit tricky to figure out so here's one more example:
phrase.gsub(/(h)([e]+)(k)([0-9])([^aeiou])/...
Here h
is the first group and would be referenced '\1'
, [e]+
is the second, '\2'
and so on until [^aeiou]
which would be group 5 referenced as '\5'
. It should also be noted that when referencing groups it's better to use single quotes because double quotes will escape the backslash.
We can also give the groups names to reference like so:
phrase = "cheese"
phrase.gsub(/(?<h>h)(?<es>[e]+)/, '{\k<h>}<\k<es>>')
# => "c{h}<ee>se"
Group names are assigned by (?<group_name>search_term)
and are referenced by \k<group_name>
. We can do some other fun manipulations this way. For instance, we can rearrange terms, just in case we need to generate some Pig Latin.
phrase = "cheese"
puts phrase.gsub(/([^aeiou]+)(\w+)/, '\2\1ay')
# => "eesechay"
This looks for one or more consonants [^aeiou]+
. This becomes group \1
. Then it looks for one or more word characters \w+
, which becomes group \2
. It then switches the order of the groups and adds ay
! Note: This will only Pig Latin-ify individual words that don't start with a vowel.
Wrap Up
Hopefully, this has demystified one of Ruby's more powerful string manipulation methods. A lot of fancy stuff can be done with gsub
even without regex although learning some regex will kick it up a notch.
Top comments (0)