I was recently doing a challenge on Exercism, on the Ruby track, and I struggled a lot, but when I ended up with a final solution, I was amazed at how much the power of Ruby let me simplify my code.
It's a pretty cool challenge, so, I'll lay out the premise first, and then you can try it out. If you're really into it, the Ruby tests for this exercise are in the Exercism Ruby repo. Most of the repos for the other languages have similar test suites for this exercise as well. This exercise is called Robot Name.
The Premise
You are creating a line of robots that each have a unique name. Their names all follow the pattern letter letter number number number
. A couple of valid names might be AB123
, YE801
, or RP100
. This means that there are 26 * 26 * 10 * 10 * 10
possible names. So here's the challenge.
- Write a class that creates Robots.
- Each Robot created must have a unique name that matches the pattern above.
- Robots must be able to be
reset
, wiping their memory and forgetting their name, receiving a new one that is still unique. Don't worry about recycling their name and returning it to the pool. We'll assume that once a name is taken, it's used up. - The name sequence must be random. That means the sequence must not be predictable.
- The Robot Class must have a
forget
method that makes it forget the current state of robots, resetting any shared state it may have.
Make sense? OK, if you're going to attempt this challenge, off you go. I'm going to share my journey below.
My Journey
1. Naive Guess and Check
The difficulty of this exercise is mainly in the fact that there are so many possibilities for names. Any attempt to build all of those strings and combinations through looping or appending to a list are just waaaay too slow. I tried a bunch of different approaches, and there were actually several false-start versions before I reached version 1. My first thought was to naively keep track of the names used, generate a random name, and check if it's in the list:
class Robot
LETTERS = ('A'..'Z').to_a
NUMBERS = ('0'..'9').to_a
@@taken_names = []
def self.forget
@@taken_names = []
end
attr_reader :name
def initialize
reset
end
def reset
name = ""
loop do
name = LETTERS.sample(2) + NUMBERS.sample(3)
break if ! @@taken_names.include?(name)
end
@name = name
end
end
This works great for the first bunch of robots created, but as soon as you get high enough that there start to be collisions, you quickly realize you've got an algorithm that could, in theory, run infinitely long searching for a needle in a 676,000-strand haystack!
But this is Ruby! There's got to be a better way! Maybe we should do it the other way, starting with a list of possible names and then popping them out, guaranteeing no collisions.
2. Popping Names Off a List
So that's a great thought, but how to build the list of names? Something like this?
@@names = LETTERS
.product(LETTERS, NUMBERS, NUMBERS, NUMBERS)
.map(&:join)
# => @@names = ['AA000', 'AA001' ... 'ZZ998', 'ZZ999']
OK, that works. The product
method creates a "Cartesian Product" between all of its arguments. For example:
[1, 2, 3].product([4, 5])
# => [
# [1, 4],
# [1, 5],
# [2, 4],
# [2, 5],
# [3, 4],
# [3, 5],
# ]
That giant product above creates a list like this:
[
['A', 'A', '0', '0', '0'],
['A', 'A', '0', '0', '1'],
...
That's why we join them all together into single strings via the .map(&:join)
method.
Startup time for our class (as well as forget
run time) is pretty long, but maybe that's allowable, seeing how much time it saves us on our algorithm. Right? Right?
Wrong. When our list is huge, randomly choosing an index and then popping that out takes FOR. EVER. Because, each time we pop an item out, all of the items behind that item have to figure out what to do with the gap that it left. This list of names is so huge, it's like trying to turn the Titanic. And how'd that work out for Leo?! (Too soon?)
I even tried generating a giant list of integers instead and converting each integer to my own custom numbering system that was (base 26, base 26, base 10, base 10, base 10), but that was more confusing and not any better.
class Robot
LETTERS = ('A'..'Z').to_a
NUMBERS = ('0'..'9').to_a
@@possible_names = LETTERS
.product(LETTERS, NUMBERS, NUMBERS, NUMBERS)
.map(&:join)
def self.forget
@@possible_names = LETTERS
.product(LETTERS, NUMBERS, NUMBERS, NUMBERS)
.map(&:join)
end
def initialize
reset
end
def reset
next_index = rand(0...@@possible_names.length)
@name = @@possible_names.pop(next_index)
end
end
This is Ruby! There must be a better way!
The Final Solution
As it turns out, there is. My fundamental idea of working from a pre-built list was the right idea, but I wasn't letting Ruby work for me enough. There were a lot of things I could improve upon.
First, the building of the list of names. I forgot how awesome ranges in Ruby are.
@@names = 'AA000'..'ZZ999'
That's right. Ruby knows how to increment each character in the string (even the letters) to fill in the gaps. I'll be honest, when this was pointed out to me by the Ruby Track Mentor after all of that work, I only cried for like 12 minutes.
Next, random access. Rather than randomly selecting each time, why not shuffle once, up front? But you can't shuffle a range in Ruby! Not a problem!
@@names = ('AA000'..'ZZ999').to_a.shuffle
This crying session only lasted 7 minutes.
Lastly, dealing with modifying this great giant list over and over. The solution? Don't. The best way to deal with a large dataset isn't to pop off of it. It's to let Ruby work for you and use an Enumerator. This lazily yields each element. It's similar to having a pointer point at the element you want and then moving the pointer to the next element, but way less work.
class Robot
POSSIBLE_NAMES = 'AA000'..'ZZ999'
@@names = POSSIBLE_NAMES.to_a.shuffle.each
def self.forget
@@names = POSSIBLE_NAMES.to_a.shuffle.each
end
attr_reader :name
def initialize
reset
end
def reset
@name = @@names.next
end
end
This way, you walk through the names until you run out. Handily, when the @@names
Enumerator runs out of names, a call to @@names.next
will raise a StopIteration
exception, telling the user that you're out of names. If you want, you could catch that exception and raise your own RobotNamesDepletedError
too. And a call to Robot.forget
renews the list of names with a new random shuffle.
What did you come up with? Did you try the exercise in another language? How did it turn out? Let me know.
Originally posted on assert_not magic?
Top comments (19)
Instead of:
You can just write:
I wondered about that. That totally makes sense. Thanks!
As yet another alternative approach... Taking either of your basic ideas (either storing all names in an
Array
, or anEnumerator
, or keeping a track oftaken_names
), you could use the regexp-examples ruby gem that I wrote a few years back to abstract the problem of "getting a valid-formatted robot name" -- for example:Then, if you have different robots with other naming rules, the rest of the cost would work out of the box. (And as a bonus, this regex will likely already exist in your code, to validate that the robot name is ok!)
Hi Tom! Thanks for sharing! How does this Gem handle collisions? One of the test cases generates all 676,000 robot names, so if calling
random_example
provides duplicates, we run into the same "theoretical infinite time complexity" issue, callingrandom_example
repeatedly until it provides the one robot name we haven't encountered yet.You could use
Regexp#random_example
in conjunction with thetaken_names
, as per your first solution in this blog post.Or -- with the potential performance issue of storing a large array in ruby memory (as with all your other examples that use
to_a
!) -- you could also useRegexp#examples
to end up with very similar solutions. (See the documentation.) For example:.
...Note that this would be a bad idea if the pattern got longer, so the number of possible results was much larger. All of your examples that use
to_a
, just like myRegexp#examples
code above, would soon freeze the system as the array length grows exponentially.Using
Regexp#random_example
- similar to your original implementation - would scale fine though.I really enjoyed reading this journey and seeing the trade offs you had to face. Thanks for sharing!
I agree w/ Phil.
Further, I've had the same cry sessions for similar reasons: I worked hard and long to do it one way and here comes a Ruby method or something that not only does it easier but with less lines of code.
🤦🏾♀️
Glad you liked it!
You can fix this by swapping the element with the last element, and then popping off the last element, which is always cheap:
This only works because you don't actually care about the order of values in
@@possible_names
!Another much more involved approach would be some kind of sparse Fenwick tree. That way, startup time is instantaneous, and you only use lots of memory when you've roughly used up every-other ID.
Woah cool! I’ll look into that. Thanks for the ideas! 😃
There's another, more direct solution to the "running
include?
on anArray
takes longer and longer as the array gets bigger" problem: use aSet
.Add
require 'set'
to the top of your file; the library ships with Ruby but isn't loaded by default.Then replace
@@taken_names = []
with@@token_names = Set.new
.Under the hood,
Set
uses aHash
to get constant-time lookups - source. See your favorite Ruby internals book for details on this - I'm a fan of Ruby Under A Microscope.BTW, I was curious about the pitfalls of representing names as integers and then converting them - so I put together some code. Note: I've been writing a lot of Elixir lately, and the style reflects some of that.
This would make using the
Set
approach even more efficient, sinceInteger
objects are 8 bytes instead of 40 bytes forString
s.Pretty cool! I didn't know that about ranges :)
Be careful... since self.names is a class method, then this implementation of
@names
is also a class variable.So both of these are using class level variables and writing to the global namespace:
In this last case, the Robot class is a singleton instance itself.
Class variables are not 'evil' as long as you know what you are doing. Best rule of thumb would be to consider a class variable generally as immutable. In this example, if we had 2 separate processes sharing the same service of make robot names... one could reset the names while the other is running. This would cause a bad side effect. However, if you simple used the class variable to store the initial full set of names and then randomized it with each instance would be safer. Class variable is being used as a constant, immutable value... while random is per instance.
Also, since the built array already exists inside the enumerable, reusing the array is slightly faster than rebuilding from a range each time:
Not even a Ruby dev, but this was a good read. Interesting challenge and nifty solution :D
Thanks! Yeah, when I first read through the challenge, I didn’t think it was going to be that hard. I was wrong.
I think Ruby is fundamentally interesting, simple and fun. If it weren't for Rails it would still be known as more of a delightful hobbyist language.
Yeah, and a pretty powerful scripting/automation language too, for server-side stuff. A lot more readable and maintainable than Bash, a lot of times.
This is a fun challenge. I wanted to see if I could figure out a way to do this without shuffle. The solution I settled on after much googling is to use a linear congruential generator algorithm, which is a psuedorandom number generator with the property that, given the right inputs, its output is a sequence of all integers in a range (0 .. m-1) in pseudorandom order with no repeats (until it exhausts the range, whereupon it begins the sequence again). In other words, it can be used as a "lazy shuffle" algorithm. Here's the implementation I came up with, as a Ruby Enumerator:
The "increment" value
C
could have been any number coprime withM
; I chose77777
purely for aesthetics.Note that if we changed
M.times do
toloop do
, the sequence would cycle infinitely.Once you have this, all that's left is to format each number as a robot name, and then some bookkeeping for the
forget!
andreset!
methods. Here's the final result:Just for kicks, here's the same thing in JavaScript: beta.observablehq.com/@jrunning/ro... (Too bad dev.to doesn't embed Observable notebooks.)