DEV Community

loading...

Regex Words, Vowels, Consonants, and Sentences in Ruby

ajrom profile image A.J. Romaniello Updated on ・5 min read

A common algorithm used in web applications is a word counter. Although with ruby, we can use regex to get a ton of information on our text in very few lines of code. I wanted to make a readable and comprehensive guide on how awesome ruby is at handling operations like this. Feel free to look at the code and test it out on repl.it

Creating a TextAnalyzer Class

The first step is to move all of our methods that deal with regexing the string/text and finding the information on it into its own class. In this instance we are going to call it TextAnalyzer.

Our class might look a little something like this at the beginning, allowing us to initialize with a text attribute and then normalizing this all to uppercase (which will help us down the road, lowercase would also work but looks less appetizing).

class TextAnalyzer
  attr_reader :text

  def initialize(text)
    # upcase for ease of counting
    @text = text.upcase
    self
  end
end
Enter fullscreen mode Exit fullscreen mode

Adding Our Basic Methods

Counting Words

Since we have our text as one lengthy string, we simply need to split this text at every space and then count the size of the array we split it into. Giving us the amount of words in our string, we don't care about punctuation or digits only spaces.

Our method would look something like:

def word_count
   text.split(' ').size
end
Enter fullscreen mode Exit fullscreen mode

Counting Characters

First we need to think of our pre-requisite:

  • A character is any text within a string (including spaces, punctuation, digits, etc)

Since we don't care about what the actual character is, we can just split all of text after every character. Which would look like:

def chars
   text.split('')
end

# and to count would be simply
def character_count
   chars.size
end
Enter fullscreen mode Exit fullscreen mode

Counting Letters

Again lets think of our pre-requisites for a letter:

  • Can't be punctuation
  • Can't be a digit
  • Can't be whitespace

Using a sweet tool called Rubular we can test our regex before implementing it. Using the Regex \W will select any non-word character, using String#gsub we can then replace those non-word characters, with empty strings ('').

All together it would look something like:

  def letters
    text.gsub(/[\W]/, '')
  end

# a little helper method to make the string into an array
def letters_array
   letters.split('')
end

  # counts the letters in our long string
  def letter_count
   # don't forget to split our string into an array first!
    letters_array.size
  end
Enter fullscreen mode Exit fullscreen mode

Counting Consonants and Vowels

Again lets check our pre-requisites for a vowel:

  • Must be A, E, I, O, or U

And for consonants:

  • Must not be a vowel

For both operations we can make use of the String#scan method that ruby provides. And once again utilizing the regex tool to either find all vowels, or all consonants. The regex we would use would be [ ] containing the letters we want to search for.

All together the methods would look something like:

  # counts the vowels from our #letters string
  def vowel_count
    letters.scan(/[AEIOU]/).size
  end

  # counts the consonants from our #letters string
  def consonant_count
   # adding ^ before our characters within the scan
   # will find anything except the given characters
    letters.scan(/[^AEIOU]/).size
  end
Enter fullscreen mode Exit fullscreen mode

It is important to note that because of our operations in our letters method, we don't need to check for punctuation, digits, or whitespace.

Finding the Most Common Letter(s)

In order to find the most common occurrence of a letter, many people build a large while loop or confusing regexes to find the most common occurrence (example). Ruby is simple, expandable, and flexible and we should code that way.

In the case of a tie we wouldn't want our #most_common_letters method to return just the first or last, we want to return all of our ties and then let someone outside our method decide which item they would like to choose. (Whether its first, last, or somewhere in-between!)

  # finds the most common character from our #letters string
  def most_common_letters
    char_hash = {}
    letters_array.each do |c|
      char_hash[c] ||= 0 unless char_hash[c]
      char_hash[c] += 1
    end

    # finds the highest occurrence
    # could use char_has.max_by {|k,v| v} to get the max and character at the same time
    # although we would rather return ALL in the case of a tie
    max = char_hash.values.max

    # returns all in the case of a tie
    char_hash.map {|k,v| {k => v} unless v < max }.compact
  end
Enter fullscreen mode Exit fullscreen mode

I've found the best way to do this by creating a hash to keep track of unique letters as the key (since we are counting them) and then update their value by 1 every time there is a new occurrence of that letter. This will return the most common letter(s) with their count as the value.

Counting the Most Common Letter

Now that we can find our most common letters with the frequency it is up to us how we want to choose our winner in the case of a tie (remember that we are returning an array of letters).

Let's say we want the first tie, we could call most_common_letters.first. Since this is returning an array of hashes (letters with their frequency) we need to specify the letter and the frequency.

Our code would look something like this:

  # holds the key, value pair for the most common letter
  def most_common
    # gets the first match from our array 
    most_common_letters.first
  end

  # gets the letter from the most common hash
  def most_common_letter
    most_common.keys.first
  end

  # gets the most common letters value
  def most_common_letter_count
    most_common.values.first
  end
Enter fullscreen mode Exit fullscreen mode

Using Our TextAnalyzer

Now we are all set up and ready to go, displaying the results is as easy as initializing with the string and displaying the associated methods.

text_to_analyze = "Hey! Isn't ruby amazing?!?"
text = TextAnalyzer.new(text_to_analyze)

display_string = <<-STR 
  Word Count: #{text.word_count}
  Sentence Count: #{text.sentence_count}
  Character Count: #{text.character_count}
  Letter Count: #{text.letter_count}
  Vowel Count: #{text.vowel_count}
  Consonant Count: #{text.consonant_count}
  Most Common Letter: #{text.most_common_letter} used #{text.most_common_letter_count} times.
STR

puts display_string
Enter fullscreen mode Exit fullscreen mode

Conclusion

Ruby is awesome! It let's use use regex and hashes to operate on strings and find occurrences under many different variables. If you find yourself looping over a string to find certain occurrences of characters you might want to use regex!

Combining regex with other ruby functionalities such as hashes and loops can make this extremely powerful for keeping track of occurrences of variables or sorting by any set of pre-requisites.

  • View the completed project on repl.it

A Reminder on how to Regex

One way I love to find my regex for the string I am working with is ask myself:

What do I want after the regex? What is going into the regex? Should I look for included characters or excluded characters?

It is very helpful to list out your requirements for the string you want after the regex.

Discussion (0)

pic
Editor guide