Isograms

#kata #challenge #codewars #python

Task description

An isogram is a word that has no repeating letters, consecutive or non-consecutive. Implement a function that determines whether a string that contains only letters is an isogram. Assume the empty string is an isogram. Ignore letter case.

isIsogram("Dermatoglyphics") returns true
isIsogram("aba") returns false
isIsogram("moOse") returns false

Task solution

Tests

Using the pytest framework we will write tests to cover the following cases:

Does a TypeError raise for invalid input
Do lower case, upper case and mixed case strings pass as expected?
What if an empty string is provided?

These conditions accounted for, the tests are as follows:

import pytest;
from isograms import is_isogram; # ./isograms.py

def invalid_input_throws():
  with pytest.raises(TypeError):
    is_isogram(1);

def happy_path_tests():
  assert is_isogram("Dermatoglyphics") == True;
  assert is_isogram("isogram") == True;
  assert is_isogram("aba") == False;
  assert is_isogram("moOse") == False;
  assert is_isogram("isIsogram") == False;
  assert is_isogram("") == True;

Implementation

Originally I went for a simple enough approach whereby I add each character into the variable cache. From here I loop over the string and for each character in the string I convert it to lowercase and if it is not in the cache, we add it in, otherwise if it is already in the cache, it cannot be an isogram and thus we return False. Naturally, if the loop ends without returning, we return True since it must be an isogram afterall. The implementation at first was like so:

def is_isogram(string: str) -> bool:
    cache = {};
    for char in string:
        lowerChar = char.lower();
        if not lowerChar in cache: cache[lowerChar] = True;
        else: return False;
    return True;

This implementation is basically pseudo-code code but we can do better. Since we have covered the red and green portions of the red, green, refactor cycle, it is time for the refactor part. For this I have implemented the following:

def is_isogram(string: str) -> bool:
  characters = [char.lower() for char in string];
  uniqueCharacters = set(characters);
  return True if len(characters) == len(uniqueCharacters) else False;

This implementation does exactly the same as the first version but in half the lined of code. I also think it is a bit more readable personally. In short, we are creating a characters array made up of the lower cased characters in the input string. Next we convert the array to a set, if you have never used a set before, it is basically just a collection or unique values and in this case, that means if our array has duplicate values, they will be removed. Finally, we do exactly as the last line of the function says: If the characters array is the same length as the uniqueCharacters set, then it must be an isogram and we must return True. Otherwise return False because it cannot be an isogram in the first place.

Conclusions

Python is a language I have a love/hate relationship with but one thing I love about it is that it is an expressive language. I especially love the way you can write conditionals like the one we have in the refactored implementation which are so human-readable that its almost silly. On the flip side, the module system and the inconsistency of parts of the languages implementation does irk me somewhat but overall it is a powerful language with a good community.

The challenge at hand was pretty simple to resolve but I enjoyed completing it never the less, especially when I had chance to look into the timeit library for execution timing tests and could see some variability in speed on each run but in almost all cases both implementations were pretty close, usually 7 seconds up to 8 seconds to run each version 1,000,0000 times.

See you in the next one!

Top comments (1)

James Robb • Dec 1 '19

Hi Andy, I totally agree with your sentiment about sets but noone will use them or understand them if noone else shows how they work, so avoidance isn't the answer I would say. I have seen lots of examples of code that creates arrays of unique values which totally bypass using sets which is ashame since it is a common thing in most languages. I am glad you noticed such a detail about the naming too, I always try to implement code in a readable, testable and understandable manner more than a short, one line hot-shot one even though as you said, doing so would be simple enough. 🤷‍♂️