DEV Community

loading...
Cover image for More Python Strings: Can You Solve This *More* Difficult String Problem?

More Python Strings: Can You Solve This *More* Difficult String Problem?

Whiteboarding with Erik
Explaining Data Structures and Algorithms problems in a way everyone can understand, in python. Currently on hiatus. All solutions here: https://github.com/erik-hei/whiteboarding-with-erik
Updated on ・5 min read

<< Week 15: Binary Trees | View Solution on GitHub | Week 17: Knapsack >>

Alphabet magnets
(Image: TurboSquid.com)

This is a problem that pops up now and again on various daily-coding-challenge platforms, like this problem on HackerRank: given a string, find all the possible combinations of letters within that string. This is sometimes called "substrings," or if a series of lists is returned, "power sets." Let's look at one example:

# Given a string, return a sorted list of all 
# unique possible substrings.
Enter fullscreen mode Exit fullscreen mode

For example, the string 'abc' would yield the list ['a', 'ab', 'abc', 'ac', 'b', 'bc', 'c'].

Let's start with defining our method. It takes in a string, which we'll call word.

def find_substrings(word):
Enter fullscreen mode Exit fullscreen mode

Next, let's think about our approach. We'll need a data structure to store all our new substrings. In the end, we'll have to return a list, but remember what it said in the prompt: unique substrings. So if we have the string 'aa', we don't want to add 'a' twice. What data structure does this sound like? If the word Set comes to mind, you're right! We've covered sets on this blog previously, but to recap, it's a data structure that stores a series of unique values.

To make a new set, we simply initiate the set object in Python.

def find_substrings(word):
  s = set()
Enter fullscreen mode Exit fullscreen mode

Now we reach the next phase of the implementation. How are we going to add each substring? Let's say we have the string 'abc'. We can start with the first letter, 'a', and add it to the set. Next, we want to add the letter 'b', along with 'b' concatenated to everything previously in the set, namely 'a' to make 'ab'. The two rules in each iteration are:

  1. Add the new letter. (e.g. 'b')
  2. Add the new letter concatented with every previous substring (e.g. 'ab')

However, we can simplify this into one rule. If we add an empty string to the set, we only need #2, since #1 is covered by adding the new letter to an empty string. Altogether, these are the substrings that would be added for each iteration:

Iteration Set Before Addition Letter to Add New Subsets
0 { '' } a 'a'
1 { '', 'a' } b 'b' 'ab'
2 { '', 'a', 'b', 'ab' } c 'c' 'ac' 'bc' 'abc'
Result: { '', 'a', 'b', 'ab', 'c', 'ac', 'bc', 'abc' }

Let's go about implementing this in code. As we established earlier, we need to start with an empty string in our set. Let's do that.

def find_substrings(word):
  s = set()
  s.add('')
Enter fullscreen mode Exit fullscreen mode

Next, we need to define our iteration. We want to look at each letter in the string. Simple enough, we use a for loop.

  for letter in word:
Enter fullscreen mode Exit fullscreen mode

Okay, so for each letter in the word, we need to loop through everything in the set and add new combos with that letter. However, if we try to change the set while we're looping through it, we'll get an error. So, we'll have to make a copy of it.

  for letter in word:
    new_set = s.copy()
Enter fullscreen mode Exit fullscreen mode

Remember that we have to use the Python .copy() method, or else we'll just create a reference to the original set. This way, we have an unmodified copy to refer back to.

Next, we loop through the elements in the immutable copy. For each substring, we want to add the current letter, and then push it onto the set.

  for letter in word:
    new_set = s.copy()
    for substr in new_set:
      s.add(substr + letter)
Enter fullscreen mode Exit fullscreen mode

Finally, we have our substrings. We just have to do some formatting to get it how the prompt wants it. First of all, we have this extra empty string element in the set. Let's remove that.

  s.remove('')
Enter fullscreen mode Exit fullscreen mode

Simple. Next, as you remember, we want to return a list, not a set. How do we cast something to a list? We iterate over each of its contents and wrap it in some square brackets, like so:

  s = [letter for letter in s]
Enter fullscreen mode Exit fullscreen mode

We can just reassign it to s, since we don't need that variable for anything else. Finally, we want to sort it. Simply call the .sort() method on our list.

s.sort()
Enter fullscreen mode Exit fullscreen mode

And that's it, all that's left is to return s. Altogether:

def find_substrings(word):
s = set()
s.add('')
for letter in word:
new_set = s.copy()
for substr in new_set:
s.add(substr + letter)
s.remove('')
s = [letter for letter in s]
s.sort()
return s
Enter fullscreen mode Exit fullscreen mode




Try it Out

Call the method find_substrings() on the string 'abc', and print the result. We should get:

['a', 'ab', 'abc', 'ac', 'b', 'bc', 'c']

Longer words will also work. 'abracadabra' will give us:

['a', 'aa', 'aaa', 'aaaa', 'aaaaa', 'aaaab'...

You get the idea. The point is that 'a' only appears once, and the list is sorted alphabetically.

Further Reading: Optimization

This was a conceptually simple solution, and the implementation of more optimal solutions won't be covered in this article. But how might we go about this?

Instead of using a set, keep the final output as a list, and only insert new strings in a sorted order. Whenever we find a new substring to add to it, we use a binary search to figure out where it should be added. If the string already exists in that point, there's no need to add it again. This saves us from the O(N log N) runtime of running sort().

This is one of those concepts that appears in different forms in assorted coding challenges. I hope this has added to your coding knowledge toolbox, and you'll be prepared the next time it comes up!

<< Week 15: Binary Trees | View Solution on GitHub | Week 17: Knapsack >>

Erik Heikkila is a Teaching Assistant at General Assembly. This blog is not associated with GA.

Discussion (2)

Collapse
mellen profile image
Matt Ellen

If ac is a substring of abc, why isn't ca?

Collapse
barryhobrien profile image
barryhobrien

To add to Matt's question, how would you change the program to get permutations of the letters in 'word' ? Thanks.