loading...

A Quick Intro to Ruby's Set Collection, Part 1: Creating Sets, and Adding/Removing Objects

isalevine profile image Isa Levine Updated on ・8 min read

A Quick Intro to Ruby's Set (3 Part Series)

1) A Quick Intro to Ruby's Set Collection, Part 1: Creating Sets, and Adding/Removing Objects 2) A Quick Intro to Ruby's Set, Part 2: Object Lookups and Efficiency 3) BONUS: A Quick Intro to Ruby's SortedSet

Recently, both my personal projects and code challenges have involved situations for collecting a unique list of objects. My usual go-to strategy is a combination of Ruby's array collection, plus the Array.uniq method.

This strategy works, but isn't always optimal based on the situation. For instance, it's helpful that Array.uniq will remove duplicates while also maintaining object order--but that's only helpful if object order mattered in the first place!

Luckily, there are alternatives! Ruby has a Set collection that functions very similarly to an array, but with a few crucial differences:

  • All objects in a Set are guaranteed unique (arrays can have duplicates)
  • Objects in a Set are not ordered (arrays are ordered by index)
  • Sets are built on top of Hashes for super-fast object lookup (arrays are just dynamic arrays under-the-hood)

Or, summarized by RubyGuides' article on the Set class:

A set is a class that stores items like an array…

But with some special attributes that make it 10x faster in specific situations!

On top of that:

All the items in a set are guaranteed to be unique.

10x faster, huh? Sounds good to me!

Overview

In this first article, we'll introduce the following basics of Ruby's Set class:

  1. How to create a set
  2. Adding objects to sets
  3. Removing objects from sets

Use case: Turning text into a set of unique one-word strings

For our use case, we'll be using one that I encountered in a personal project: turning a long string into a set of unique one-word strings.

My Dungeons & Dragons character creation tool, Friendly Character Generator, scans a newly-generated character for one-word search tags that are then used to find matching backstory snippets in the database.

For instance, the character Big Sword Knight has a skill that contains the word "armor":

"When you choose this domain at 1st level, you gain proficiency with heavy armor."

The database has the following backstory snippet, which should be found by querying the search tag "armor":

"Due to their obsession with infinite levels of constitution and unkillability, they sleep (and shower) in their full armor, and they tend to reek--remember to do laundry before any stealth missions!"

So, to accomplish this, we need to turn the string “When you choose this domain at 1st level, you gain proficiency with heavy armor.” into a set of unique, one-word strings that we can use to query the database.

Why is a Set appropriate here? There's a few advantages:

  1. Uniqueness -- since we want to query the database for a given word one time only, the set guarantees we won't repeat any words (even if we add more words to it later!)
  2. Efficient Lookups -- if we do need to add more words, it is extremely fast to check if one (or more!) words already exist within the set
  3. Unordered -- the order of words we query doesn't matter, so no need for an array's ordering!

Sidenote: Text normalization

I really enjoy string manipulation, so I decided to use some fancy Regex and basic string methods to normalize the string--removing punctuation, separating hyphenated words, making everything lowercase, etc.

For funsies, here's the code used for text normalization:

string = "When you choose this domain at 1st level, you gain proficiency with heavy armor."

regex1 = /(-)|(--)|(\.\.\.)|(_)/          # separate words joined by hyphens, m-dashes, underscores, and ellipses
regex2 = /([.,:;?!"'`@#$%^&*()+={}-])/    # remove all other punctuation
string.gsub!(regex1, " ")
string.gsub!(regex2, "")
string.downcase!

puts string
#=> "when you choose this domain at 1st level you gain proficiency with heavy armor"

How to create a set

require 'set'

Set is an odd collection because Ruby makes us explicitly require it:

require 'set'

Start with an array

After text normalization, let's assume we start with this string:

string = "when you choose this domain at 1st level you gain proficiency with heavy armor"

The most direct way to create a set is by converting an array, so let's go ahead and and .split(" ") the string where there are spaces:

array = string.split(" ")
#=> ["when", "you", "choose", "this", "domain", "at", "1st", "level",
#    "you", "gain", "proficiency", "with", "heavy", "armor"]

Now we have few ways to create a set from this array:

Set.new(array)

We can use the Set class's .new method, and pass our array into it:

set = Set.new(array)
#=> #<Set: {"when", "you", "choose", "this", "domain", "at", "1st", 
#   "level", "gain", "proficiency", "with", "heavy", "armor"}>

Voila! The duplicate string "you" only appears once in our new set--and we can rest assured that no more duplicates can be added!

If no argument is passed, an empty set will be created.

array.to_set

We can also use the enumerable method .to_set on our array to create a new set:

set = array.to_set
#=> #<Set: {"when", "you", "choose", "this", "domain", "at", "1st",
#   "level", "gain", "proficiency", "with", "heavy", "armor"}>

Super-easy, same result!

Set[]

The last way we can create a set is by simply writing out the array with Set prepended to it:

Set["when", "you", "choose", "this", "domain", "at", "1st", "level",
    "you", "gain", "proficiency", "with", "heavy", "armor"]
#=> #<Set: {"when", "you", "choose", "this", "domain", "at", "1st",
#   "level", "gain", "proficiency", "with", "heavy", "armor"}>

But, in our case, we're violating the DRY doctrine by literally rewriting the array! This obviously works better for situations where you want to start by writing out the Set by hand.

Now we can enumerate with .each!

Once we have our set created, we can use .each to cycle through the objects, just like with arrays! The main difference is that .each will pass the objects in a RANDOM order--since the Set class is built on top of Hash, its contents will not be passed in a predictable order, and we don't have access to an index integer.

set.each do |str|
    puts str
end

#=> when
#   you
#   choose
#   this
#   domain
#   at
#   1st
#   level
#   gain
#   proficiency
#   with
#   heavy
#   armor

It may look ordered, but we can't always guarantee that particular order. From the Ruby docs on Sets, emphasis mine:

Set implements a collection of unordered values with no duplicates.

So, just don't count on that same order happening again.

Adding objects to sets

Let's say we want to add the name of our character, Big Sword Knight, to our set. We have the name stored as three normalized, one-word strings in an array:

name_array = ["big", "sword", "knight"]

Now we want to add each of these to our set. We can add them one-by-one, or all-at-once. We also have an option to return nil if a given object cannot be added, due to being a duplicate.

NOTE: We could actually replace our name_array with another set! Since we can use the .each method on both arrays and sets, it wouldn't break the code we're about to write. BUT, we're using an array just to easily differentiate the objects.

.add

The most straightforward way is to use .add to add an object to our set:

set.add("knight")
#=> #<Set: {"when", "you", "choose", "this", "domain", "at", "1st",
#  "level", "gain", "proficiency", "with", "heavy", "armor", "knight"}>

We can also iterate through name_array and repeat the operation with each string it contains:

name_array.each do |str|
    set.add(str)
end
#=> #<Set: {"when", "you", "choose", "this", "domain", "at", "1st",
#   "level", "gain", "proficiency", "with", "heavy", "armor", "big",
#   "sword", "knight"}>

In the console output, we can see that the new strings appear at the end of the set--BUT REMEMBER THAT THESE OBJECTS ARE NOT ORDERED!

We can also use .add to add the entire array, as sets can contain different object types:

set.add(name_array)
#=> #<Set: {"when", "you", "choose", "this", "domain", "at", "1st", 
#   "level", "gain", "proficiency", "with", "heavy", "armor", 
#   ["big", "sword", "knight"]}>

Bam! Whole array in the set. If you iterate through sets containing different types, remember to build in handling for that!

<< (shovel)

The same behavior as .add can be achieved with its alias, the shovel << operator:

name_array.each do |str|
    set << str
end
#=> #<Set: {"when", "you", "choose", "this", "domain", "at", "1st",
#   "level", "gain", "proficiency", "with", "heavy", "armor", "big",
#   "sword", "knight"}>

NOTE: In both the .add and << methods, nothing happens (and nothing is returned) when an object cannot be added to a set.

.merge

You can simplify the .each loop to add individual objects from another collection to the set by using .merge. This will pass the objects one-by-one, and is a lot more readable.

set.merge(name_array)
#=> #<Set: {"when", "you", "choose", "this", "domain", "at", "1st", 
#   "level", "gain", "proficiency", "with", "heavy", "armor", "big", 
#   "sword", "knight"}>

Just like .add and << above, .merge will NOT return anything from failing to add an object.

.add?

In contrast to .add, <<, and merge methods, .add? will return nil if an object cannot be added to a set:

set.add?("armor")
#=> nil

This is is especially useful if you need to do things like keep track of how many objects are rejected from the set:

duplicate_counter = 0
duplicate_array = ["proficiency", "with", "heavy", "armor"]

duplicate_array.each do |str|
    if set.add?(str) == nil
        duplicate_counter += 1
    end
end

puts duplicate_counter
#=> 4

Note: These methods mutate the original set!

All of the above methods will mutate the original set by changing its contents. This differentiates the methods from other Set pipe method | (also aliased as + and .union), which will make a copy of the set to mutate and return, while the original set remains unchanged.

Removing objects from sets

Now, let's say we want to remove some words unrelated to our characters backstory, like prepositions and pronouns:

remove_array = ["when", "you", "this", "at", "with"]

.delete

.delete is the opposite of .add, and will remove the object passed to it from the set (if it exists):

set.delete("when")
#=> #<Set: {"you", "choose", "this", "domain", "at", "1st", "level", 
#   "gain", "proficiency", "with", "heavy", "armor"}>

We can also iterate through remove_array to repeat the .delete operation:

remove_array.each do |str|
    set.delete(str)
end
#=> #<Set: {"choose", "domain", "1st", "level", "gain", "proficiency", 
#   "heavy", "armor"}>

As with .add, calling .delete on a collection like an array will try to delete the array as a whole, matching object and not by its individual contents:

set.delete(remove_array)
#=> #<Set: {"when", "you", "choose", "this", "domain", "at", "1st",
#   "level", "gain", "proficiency", "with", "heavy", "armor"}>

Unfortunately, there is no equivalent shovel for .delete.
:'(

.subtract

.subtract is the opposite of .merge, and will simplify the .each loop by calling .delete on each object within a passed collection:

set.substract(remove_array)
#=> #<Set: {"choose", "domain", "1st", "level", "gain", "proficiency", 
#   "heavy", "armor"}>

Keep in mind, .delete and .subtract will not return anything if a given object cannot be removed from the set.

.delete?

.delete? will return nil if an object cannot be removed from the set:

set.delete?("shield")
#=> nil

Y'know, if you like that sort of thing.

Note: These methods mutate the original set, too!

Just like the addition methods above, these deletion methods will all mutate the original set. For methods that make a copy of the set to mutate and return, see the Set "minus" method - (also aliased as .difference).

Conclusion

Now you're ready to create some sets in your Ruby code! You can practice by looking for opportunities to use sets instead of arrays in your own code. The main two questions you can ask are:

  • Do the objects need to be unique?
  • Does the object order matter?

In future articles, we'll continue by looking into:

  1. Object lookup in sets
  2. Checking for subsets within a set
  3. Splitting a set into subsets with the Set.divide method

Links and Sources

Excellent introduction to Ruby Sets by Al Scott

Ruby docs - Sets

RubyGuides intro to Sets

DotNetPerls - Ruby Set examples

Got any tips, tricks, or instances where you like to use sets? Please feel free to share in the comments below!

A Quick Intro to Ruby's Set (3 Part Series)

1) A Quick Intro to Ruby's Set Collection, Part 1: Creating Sets, and Adding/Removing Objects 2) A Quick Intro to Ruby's Set, Part 2: Object Lookups and Efficiency 3) BONUS: A Quick Intro to Ruby's SortedSet

Posted on by:

isalevine profile

Isa Levine

@isalevine

Isa (ee-suh). She/her pronouns. Full stack developer working with Rails and Vue. Drinks too much bubbly water.

Discussion

markdown guide
 

It recently clicked for me just how powerful a tool Ruby On Rails is despite it not being the most fashionable thing. Loving all these Ruby posts you've been putting out!