Today I was reading through an article on Dev.to by Ben Greenberg about an interview coding challenge, and I got hooked and had to try it for myself. I highly recommend you read his original post before you read this so you have some background. That being said, for the lazy ones, let me give you...
Some Background
The challenge is this (copied from the original post):
Sort the characters in the following string:
abcdefghijklmnopqrstuvwxyz_
by the number of times the character appears in the following text (descending):
… [String omitted for brevity]
Now take the sorted string, and drop all the characters after (and including) the _. The remaining word is the answer.
You can find the super long string in the comments section of his post (where I asked for it ðŸ˜) if you want to try it yourself. You should! And share your solution! Anyways, on with the story.
My Solution (Part 1)
If it's not abundantly clear, I'm about to share my solution. If you want to work it out for yourself without spoilers, go ahead and stop reading until you're done. I won't write anymore until you're finished. 😄
Finished? Great! ONWARD.
I decided to use Ruby because Ruby is fun and great. Initially, I just wanted to knock out the first solution I could think of -- regardless of how inefficient or slow it was -- and come back to clean it up after.
# I stored the huge string in a separate file called
# `blob.txt` so as to not clutter my main script
blob = File.read('blob.txt')
letters = 'abcdefghijklmnopqrstuvwxyz_'.chars
def naive_message(letters, blob)
letters
.sort_by { |letter| blob.count(letter) }
.reverse # to get desc. order
.join
.split('_')
.first
end
puts naive_message(letters, blob)
I won't spoil the answer, but let's just say that I "pray, plead, beseech, urge" you to try it on your own. Possibly in the past tense. I think the markdown format of the initial text blob may have skewed the number of "_" characters, causing my answer to have slightly more characters on the end than should actually be there. UPDATE: That's totally what happened. Accurate text blob here.
My Solution (Part 2)
Anyways, I looked back at my solution and thought to myself, "Ryan, that doesn't look very performant. I have to imagine that running blob.count(letter)
for each letter is the worst case performance for this scenario (27 'letters' * n chars in the blob, looping through the whole blob for each letter). It seems like it should be more efficient to do it the way Ben did it, which is by looping through the blob once and counting each letter along the way. So I tried that.
def efficient_message(letters, blob)
# counter = {a: 0, b: 0, c: 0...}
counter = letters.product([0]).to_h
# run through blob once only
blob.each_char { |c| counter[c] += 1 }
# sort and trim off everything after _
counter
.sort_by(&:last) # sort by the count
.map(&:first) # grab just the letter key into an array
.reverse
.join
.split('_')
.first
end
Not as pretty, in my opinion, but hopefully faster. Ruby, being interpreted, is slower than most compiled langauges, so this should help. (So I thought...)
Comparing Performance
Was this optimization worth it? I needed to find out. Luckily Ruby comes with an awesome Benchmarking library built-in. (Oh Ruby, what is there that you can't do?)
require 'benchmark'
# ... My code above
Benchmark.bmbm do |bm|
bm.report("Naive: ") { 1000.times { naive_message(letters, contents) } }
bm.report("Efficient: ") { 1000.times { efficient_message(letters, contents) } }
end
Benchmark
has a method called bmbm
that runs one trial run and then a second real run. This helps shake out any overhead performance drains from the garbage collector. And to my horror:
~\desktop> ruby .\secret_word.rb
Rehearsal -----------------------------------------------
Naive: 0.047000 0.000000 0.047000 ( 0.039974)
Efficient: 0.484000 0.000000 0.484000 ( 0.481631)
-------------------------------------- total: 0.531000sec
user system total real
Naive: 0.031000 0.000000 0.031000 ( 0.038011)
Efficient: 0.483000 0.000000 0.483000 ( 0.478715)
The "Efficient" version is ~10x slower than the "Naive" version! Noooo! "But, why?" you ask. "How can this be?" I had the same questions.
Algorithmic Profiling
Ruby has a built-in profiler, but a short Google search told me that there was a better option: ruby-prof
. After a quick gem install ruby-prof
, I was back at it again with the white vans. (Check out the Ruby-Prof Documentation to learn more).
require 'ruby-prof'
# ... The previous code
RubyProf.start
1000.times { naive_message(letters, contents) }
result = RubyProf.stop
RubyProf::FlatPrinter.new(result).print(STDOUT)
RubyProf.start
1000.times { efficient_message(letters, contents) }
result = RubyProf.stop
RubyProf::FlatPrinter.new(result).print(STDOUT)
I added titles below for clarity.
Naive:
===================
Measure Mode: wall_time
Thread ID: 3259000
Fiber ID: 20752200
Total: 0.066000
Sort by: self_time
%self total self wait child calls name
50.00 0.033 0.033 0.000 0.000 27000 String#count
18.18 0.059 0.012 0.000 0.047 1000 Enumerable#sort_by
10.61 0.007 0.007 0.000 0.000 136000 Integer#<=>
10.61 0.040 0.007 0.000 0.033 1000 Array#each
3.03 0.002 0.002 0.000 0.000 1000 Array#reverse
3.03 0.066 0.002 0.000 0.064 1000 Object#naive_message
1.52 0.001 0.001 0.000 0.000 1000 Array#first
1.52 0.001 0.001 0.000 0.000 1000 Array#join
1.52 0.001 0.001 0.000 0.000 1000 String#split
0.00 0.066 0.000 0.000 0.066 1 Global#[No method]
0.00 0.066 0.000 0.000 0.066 1 Integer#times
* indicates recursively called methods
Efficient:
==============
Measure Mode: wall_time
Thread ID: 3259000
Fiber ID: 20752200
Total: 0.688000
Sort by: self_time
%self total self wait child calls name
93.60 0.644 0.644 0.000 0.000 1000 String#each_char
2.04 0.025 0.014 0.000 0.011 1000 Enumerable#sort_by
1.16 0.008 0.008 0.000 0.000 136000 Integer#<=>
0.58 0.005 0.004 0.000 0.001 1000 Array#map
0.44 0.688 0.003 0.000 0.685 1000 Object#efficient_message
0.44 0.003 0.003 0.000 0.000 1000 Array#reverse
0.44 0.003 0.003 0.000 0.000 1000 Array#product
0.44 0.003 0.003 0.000 0.000 1000 Array#to_h
0.29 0.003 0.002 0.000 0.001 1000 Hash#each
0.29 0.002 0.002 0.000 0.000 1000 String#split
0.15 0.001 0.001 0.000 0.000 27000 Array#last
0.15 0.001 0.001 0.000 0.000 28000 Array#first
0.00 0.688 0.000 0.000 0.688 1 Global#[No method]
0.00 0.000 0.000 0.000 0.000 1000 Array#join
0.00 0.688 0.000 0.000 0.688 1 Integer#times
* indicates recursively called methods
From what I can tell, the String#count
method is super-optimized, and String#each_char
is a relatively expensive operation (it has to create an array the length of the blob!). So, in the long run, looping through the blob string a bunch of times using the faster String#count
ends up being more performant. So much for going through the trouble to generate an efficient solution.
Wrap Up
Anyways, I hope you're able to get your heart rate back to normal after such a roller-coaster ride. Be sure to share your solution on Ben's post. Also, since this was originally a kind of interview code puzzle, if you're someone that has interviewed people, I would love any feedback on my solution or the surrounding presentation! Is it similar to what you would expect from an applicant, or am I missing something important?
Thanks for reading!
Originally posted on assert_not magic?
Top comments (21)
This is great! I can't even begin to describe my love for Ruby. It is such a clean language.
Not fully optimized script to do the same in python
Nice! Thanks for sharing. :) Have you seen the built-in Counter class? This seems like a perfect use-case for it.
Bonus! On my machine, this version is even 5 times faster than the best Ruby version!
This is great. Will use counter effectively. Thanks for showing me counter(). :-)
I wondered if I could do better performance-wise, based on knowledge of the problem that the more general built-in classes couldn't take advantage of. Ended up with this:
Testing both implementations a few times suggests this way is faster, but only 1% or so.
Might try this in go, just for the practice...
Nice! I haven't used
fromkeys
before. That's a nice alternative to defaultdict (and you don't have to import anything!) if you already know the keys you need.Python is on my list of things to learn soon and can't help but remark how much it "looks" like Ruby without knowing much about it yet.
"...without knowing much about it yet." This is great about python. Nice challenge this one. :-)
Nice post. It's an interesting look at how algorithmic complexity can be trumped by real-world details.
But I think you'll find that the performance is not exactly in the
each_char
(which is probably lazily evalutated) but is caused by:each_char
O(1)
)Thanks, that’s super interesting! Good to consider for future block usage. I can’t think of a way to speed up the block setup and tear down, but I wonder if I’d pick any performance up by switching the hash keys to symbols. I’ll have to do some ‘sperimenting 👍🏻
Obvious (!!!) question to ask: What if the text does not fit into memory?
Distributed version in Scala with Spark:
Thanks for the amazing community. The way you welcome your new users are amazing. Hey Wait did you check this Solarmovies alternatives Thanks
Interesting challenge. I want to see how it can be gracefully implemented in Go, even it will be more code than Ruby, but it can be elegant. github.com/plutov/practice-go/tree...
I am deeply attracted by your post. It is really a nice and informative one. I will recommend it to my friends. Also, you can read this: Putlocker
Here is solution for same problem in Go - github.com/plutov/practice-go/blob...
Thanks for sharing this article. Also see the thetechbytes.net.
Thank you for telling us but can you tell me how to add images in footer of this site: youtech.ooo/
Some comments may only be visible to logged-in visitors. Sign in to view all comments.