DEV Community

Secret Message

Ryan Palo on October 17, 2017

Today I was reading through an article on Dev.to by Ben Greenberg about an interview coding challenge, and I got hooked and had to try it for mysel...

Read full post

Ben Greenberg • Oct 17 '17

This is great! I can't even begin to describe my love for Ruby. It is such a clean language.

Talha • Oct 17 '17 • Edited

    def give_me_word(text):
        ans=dict()
        the_text=""
        for i in text:
            if i in ans:
                ans[i]+=1
            else:
                ans[i]=1

        comparison=ans['_']
        sorted_list=sorted(ans.values(),reverse=True)
        rev_dict = dict((v,k) for k,v in ans.iteritems())

        for i in sorted_list:
            if(i>=comparison and rev_dict[i]!='_'):
                the_text=the_text+rev_dict[i]
            elif rev_dict[i]=='_':
                break

        return the_text

    text="the_text"

    print give_me_word(text)

Not fully optimized script to do the same in python

Ryan Palo • Oct 17 '17 • Edited

Nice! Thanks for sharing. :) Have you seen the built-in Counter class? This seems like a perfect use-case for it.

from collections import Counter

blob = "..."
letters = "abcdefghijklmnopqrstuvwxyz_"

def message(letters, blob):
    hist = Counter(blob)    # Counter takes any iterable and makes
                            # a histogram of its items!

    sorted_letters = ''.join(x[0] for x in hist.most_common())
    return sorted_letters.split('_')[0]

print(message(letters, blob))

Ryan Palo • Oct 17 '17

Bonus! On my machine, this version is even 5 times faster than the best Ruby version!

cat test.py | python -m timeit
100000000 loops, best of 3: 0.00844 usec per loop

Talha • Oct 17 '17

This is great. Will use counter effectively. Thanks for showing me counter(). :-)

edh_developer • Oct 18 '17

I wondered if I could do better performance-wise, based on knowledge of the problem that the more general built-in classes couldn't take advantage of. Ended up with this:

import string

longString="..."

def decode(msg):
    # Testing suggests this is a little faster than using defaultdict
    charCountHash = {}.fromkeys(string.ascii_lowercase,0)
    charCountHash['_'] = 0

    for ch in msg:
        charCountHash[ch] += 1

    c = charCountHash['_']
    resultHash = {charCountHash[k]: k for k in charCountHash.keys() \
        if charCountHash[k] > c }

    return ''.join( [ resultHash[index] for index in \
        sorted(resultHash.keys(), reverse=True) ] )

print(decode(longString))

Testing both implementations a few times suggests this way is faster, but only 1% or so.

Might try this in go, just for the practice...

Ryan Palo • Oct 18 '17

Nice! I haven't used fromkeys before. That's a nice alternative to defaultdict (and you don't have to import anything!) if you already know the keys you need.

Ben Greenberg • Oct 17 '17

Python is on my list of things to learn soon and can't help but remark how much it "looks" like Ruby without knowing much about it yet.

Talha • Oct 17 '17

"...without knowing much about it yet." This is great about python. Nice challenge this one. :-)

Andrew Buntine • Oct 17 '17

Nice post. It's an interesting look at how algorithmic complexity can be trumped by real-world details.

But I think you'll find that the performance is not exactly in the each_char (which is probably lazily evalutated) but is caused by:

The setup/teardown (stack frames, etc) for the block that is passed to each_char
The repeated hash lookups (although they are supposed to be O(1))

Ryan Palo • Oct 17 '17

Thanks, that’s super interesting! Good to consider for future block usage. I can’t think of a way to speed up the block setup and tear down, but I wonder if I’d pick any performance up by switching the hash keys to symbols. I’ll have to do some ‘sperimenting 👍🏻

Tobias Salzmann • Oct 19 '17 • Edited

Obvious (!!!) question to ask: What if the text does not fit into memory?

Distributed version in Scala with Spark:

def message(
  distributedBlob: RDD[Char], 
  chars: Set[Char] =  ('a' to 'z').toSet + '_'
)(implicit sc: SparkContext): String = {
  val frequencies = distributedBlob
    .filter(chars.contains)
    .countByValue()
  frequencies
    .toSeq
    .sortBy(- _._2)
    .map(_._1)
    .mkString
    .takeWhile('_' !=)
}

Mukesh Bhardwaj • Dec 22 '19

Thanks for the amazing community. The way you welcome your new users are amazing. Hey Wait did you check this Solarmovies alternatives Thanks

Alex Pliutau • Oct 18 '17

Interesting challenge. I want to see how it can be gracefully implemented in Go, even it will be more code than Ruby, but it can be elegant. github.com/plutov/practice-go/tree...