Posted on Jan 28, 2020 • Edited on Jun 12, 2020

Permutations, Combinations, & Subsets

#computerscience #algorithms #datastructures #csharp

This post is longer than usual & very heavy on problem solving, instead of the usual "learning about a DS/Algo, then writing an example". I don't expect to get back into this style of post again until I have finished the remaining curriculum. It just happens that the best way to really learn this material was to solve some problems after covering theory. I encourage you to stick with each problem even if it doesn't click the first time around.

Sets, Subsets, & Combinations

Resources

Explanation of sets and subsets (slideshow)

Takeaways

A set is a collection of elements.
A set with no elements is still a set and is known as an empty set.
A subset is any combination of elements from a set.
An empty set is a subset of any set. This means every set has an empty subset.
Sets are represented using the notation: {,}.
Example: {1,5} is a set. {}, {1}, {5}, & {1,5} are all of its possible subsets.
There is a formula for working out how many subsets a set has: A set with n elements has 2^n subsets.
Using our previous example ({1,5}): n = 2 as there are 2 elements in our set. 2^n = 2^2 = 4. So a set with 2 elements has 4 subsets. A set with 3 elements has 2^3 subsets, which is 8.
When dealing with strings, often we refer to subsets as combinations. The idea is the same as subsets:
- "AB" has 2^2 combinations (as there are 2 characters). So all 4 combinations of "AB" are: {}, {'A'}, {'B'}, {'A', 'B'}.

Permutations

Resources

*You may find that the book is freely available online as a PDF. However, I do not condone getting the book for free as it has been illegally published.

Takeaways

A permutation is essentially an ordered combination, except the total length of each permutation must equal the original input.
Finding all permutations of a string is sort of the same as saying "find all anagrams of a string" (except our permutations might not all be real words).
There are two types of permutation: with repetition & without repetition.
Simple permutation example: "AB" has 2 permutations: "AB" and "BA".
Repetition example: "AAB" has 6 permutations: "AAB", "ABA", "BAA", "BAA", "ABA", "AAB". Notice there are repeated characters, this is a permutation with repetition allowed (as each "A" is treated as distinct). Without repetition, the total permutations would be 3: "AAB", "ABA", "BAA".
To calculate the total number of permutations when repetition is allowed use the following formula: n! (! means factorial).
What does ! (factorial) mean in real terms? Well, we know that n is the number of elements, and ! just means we multiply a series of descending numbers.
Lets look at an example of n!: Given a string "ABC" how many characters can be placed in the first position? 3. How many can be placed in the second position? 2 (assume we have already placed a character at the first position). How many can be placed in the last position? 1. So n! for "ABC" is simply: 3x2x1 = 6. See how that works? We just take the number of available characters for each position, then multiply.
So how do we work out how many possible permutations there are when repetition is not allowed? Answer: n!/2!^k
What is k in that formula? k is the number of repeated elements in the input. Example: "AAB" has n!, or 3x2x1, permutations. So the total permutations are 6. Without repetition allowed the formula is: 3x2x1 / 2^1. k is 1 as "A" is the only repeated character. So the total permutations for "AAB" without repetition is 3x2x1 / 2^1 = 6 / 2 = 3. Here is a helpful example of working out the distinct permutations for "TOFFEE" (hint: k is 2)

Finding all permutations of a string

Resources (same as the resources already listed for permutations)

Takeaways

So now we know what a permutation of a string is, how do we generate all permutations of a string?
Start with a function that takes an input string, and a string or StringBuilder called current (which represents each permutation), we can distill the operation to the following:
- If you are past the last position in the input string:
  - print the string and return.
- Otherwise:
  - For each letter in the input string:
    - If it's marked as used (already in current), skip to the next letter
    - Else, place the letter in current
    - Mark the letter as used
    - Recursively call the current function to add the remaining letters to current
    - Mark the letter as unused and remove it from current
    - Continue iterating
The above will ensure we permute a string, with repetitions allowed. So how do we do it without repetition? And what if we want to ensure the result is in lexicographically sorted order (order based on the alphabetical order of the permutations component letters)?
One way to achieve this is to sort the input, and associate each repeated character with the number of times it is repeated.
For example: "BAA" would give us sorted key:values of: {A:2,B:1}.
Once this is done, the algorithm looks almost identical. The only difference is that we use a modified version of the input string (letters and their counts) and decrement the count of a letter when we have used it (when count == 0 we skip the letter).
As there are n! permutations our time complexity is O(n! n), and our space complexity is O(n!).
As always, the code for these two variations is at the end of this post. See PrintPermutations() and FindPermutationsNoDupes().

Finding the next permutation of an integer array

Resources

Takeaways

Problem statement: Given an input array of integers, find the next possible permutation and return the result.
This problem is slightly different to generating all possible permutations. For this problem we only need to find a single permutation, and it must be the next permutation that would come in lexicographical order.
The brute force approach is to generate all possible permutations, and when we hit a permutation that is the same as the input, return the permutation that follows.
This is obviously less than ideal: what if the given input is the last possible permutation? That's a lot of wasted effort. Time complexity would be O(n!) and space complexity would be O(n).
A better way is to first recognize a few key traits that allow us to form a solution:
- For any given input that is in descending order, no next permutation is possible. For example, no next permutation exists for [9, 5, 4, 3, 1]
- So, if we can find the inversion point of an input - we can calculate the next permutation
- The inversion point is the position in the array after which the elements stop descending (remember we said that if it continually descends, no further permutations are possible)
- Once we find the inversion point, we swap the element at that position with the next element in the array that is larger than it. Then we sort in ascending order, the elements after the inversion point.
- Because everything after the inversion point will inherently be sorted in descending order - we can just reverse this section of the array instead.
- The array will now be arranged such that it is ordered as the next permutation - meaning we can return the result.
How do we find the inversion point? We iterate over the input, starting at the last element. We know we have reached the inversion point when input[i] > input[i - 1]. When this holds true, everything after input[i - 1] (the inversion point) cannot be rearranged to produce a larger permutation, since the elements are in descending order (elements right of inversion point).
This improved approach has a time complexity of O(n) and a space complexity of O(1) (as the operation is in-place).
See FindNextPermutation() at the end of this post for the implementation.

Next permutation visualization

Finding all combinations/subsets of a string/integer array

Resources

Takeaways

This algorithm is very similar to the algorithm for generating permutations.
The main difference is that there is not the length condition (remember that a subset/combination can be empty and/or have less elements than the input set - whereas a permutation must be equal in length to the input).
The removal of this condition greatly simplifies our algorithm, as we no longer need to keep track of if we have already used an element/if it needs skipping (at least, in the same way).
We instead just keep track of/memoize our function's loop's current index and pass it to our our function when recurring.
Our loop for permutations always started at 0. Our loop for combinations will always start at a given index. This means our combinations function will take an index as an argument.
To generate all combinations/subsets in lexicographical order, we can first sort the input.
To prevent duplicates we can add a check right at the start of our loop. If the loop's current index (i) is not the same as the index passed into our function, and if element at the ith position is the same as the element at i - 1 (previous) - we skip the ith element (current element).
The algorithm distilled is:
- We start with an empty list (called current) that will represent each combination, an index of 0, and an empty result/output collection to return.
  - Add current (representing a combination) to the output/result
- For each element of the input, from a given index to the final index
  - Append the ith element to current
  - Recursively call the current function, passing it current and i + 1 (to move to the next element in the input)
  - After the recursive call, remove the last element from current (which is the one we added before the recursion) and continue with the loop (which repeats the process)
Finding all combinations has a space complexity of O(n) and a time complexity of O(2^n) (as there are 2^n possible combinations).
Try looking at the FindAllComibinations() function at the bottom of this post and compare it to the PrintPermutations() function. You will see the similarities and the differences I have explained, hopefully with greater clarity.

Finding all the letter combinations of a phone number

Resources

Takeaways

Problem statement: Write a function that takes an input array of integers. Each integer represents a number on a telephone keypad. You will notice each number on a phone's keypad represents a certain number of letters (except 1 and 0). For example: 2 represents ABC. Given the input array, return all of the possible "words" or combinations of letters that can be represented by the given input.
Algorithm for solving the above:
- Create a mapping array (mapping), where each index of the array represents the letters for that index. For example: ["0", "1", "abc", "def", ...]. mapping[2] would give us the letters for 2, and so on.
- Transform the input array into a string of digits called digits. Example: [2,3,4] would become "234".
- Our function will take digits, an empty string (current, which will represent each combination), a start index of 0 (i), and a result list (results).
- Inside our function:
  - If i is the same length as digits:
    - Add current to results, and return.
  - Else, find the letters in mapping that correspond to the integer at digits[i]
  - For each of the corresponding letters, add the letter to current and recursively call our function passing in digits, current, i + 1, and results
At the completion of the above algorithm, all combinations will be in results.
Time & space complexity are both O(3^n * 4^m). n is the number of digits that map to three characters (2,3,4, etc), and m is the number of digits that map to four characters (7,9).
How can we verify that the total number of combinations will be 3^n * 4^m? Example: "23" has 9 combinations which are as follows ["ad", "ae", "af", "bd", "be", "bf", "cd", "ce", "cf"]. So, n = 2 which means 3^n * 4^m = 3^2 = 9 (we ignore 4^m as m = 0).
Check out LetterCombinations() in the code at the end of this post to see the full implementation.

Word subsets (finding all words that contain a subset of letters)

Resources

Takeaways

Problem statement: Given two arrays A & B. Each array consists of n number of lowercase strings. A word b in B is a subset of a word a in A if every letter in b is present in a including multiplicity. For example: roo is a subset of room, but is not a subset of roar. We can say a word a in A is universal if every word in B is a subset of a. The challenge is to return all of the universal words in array A.
One way to solve this is to transform all of the words in B into a single word b.
Now we can check every word a in A, and if a is a superset of our single b (meaning b is a subset of a) then a is universal.
The way we achieve this in code is to transform our single word b into an array of integers. Each position in the array will represent a letter in the alphabet, and each integer in the array will be the maximum number of occurrences of each letter in b. For example: aab would be represented as [2,1]. The first index of our array is where we put the max occurrences of the letter a. As a occurs two times we store two at that index. b occurs only once, and belongs at the second index. Remember we are storing maximum occurrences, so if B = ["aab", "aab"] our resulting array would still be [2,1] - as we count the occurrences of each word in B, not the sum of all the occurrences.
Now that we have transformed B into a single word (lets call it bMax[]), represented by an integer array of maximum letter occurrences, we loop over A and for each word a do the same transformation (lets call this aMax[]).
After transforming a we now perform another loop of 26 iterations (the number of letters in the alphabet) and for each iteration do the following comparison aMax[i] < bMax[i]. If any of the characters in aMax[] (which represents a) appear less times than the same character in bMax[] (representing our singleb) then we know the word a is not universal. Otherwise, a is universal and we can add it to our results set.
After finishing our loop over every word a in A and doing the above operation each time, we can return our resulting list of universal words.
Check out WordSubsets() at the end of this post to see the implementation.
The time complexity of this solution is O(A + B). Space is O(A.Length + B.Length).

Subset sum (Find/count all subsets with a given sum)

Resources

Takeaways

Problem statement: Given a set of positive integers A and an integer S, find the subsets of A whose sum is equal to S.
Variations of this problem are the same in premise, but can ask you to return true if a single subset with sum S exists, ask you to return all the subsets that sum to S, or simply ask you to count the number of subsets that sum to S. In my implementations I either return or count the subsets in A that sum to S.
A brute force approach to this problem would be to generate all subsets for the given input, and check if any of them sum to S. As there are 2^n subsets the time complexity for this approach is O(2^n * n) - less than ideal.
A better approach is to use recursion, and for each element in A decide between two possibilities:
- Include the current item in the current subset and recur for the remaining items in A with the remaining sum.
- Or we exclude/skip the current item from the subset and recur for the remaining items, leaving the running total/sum unchanged.
It's important to note that during the recursion we keep a running total/sum that starts equal to S, each time we include a number in the current subset we subtract it's value from the running total and recur with the remainder. We know we have a complete subset when the running total is 0 - meaning our current subset's sum is equal to S.
We know that our current subset does not sum to S if we have run out of elements to recur for, or if our running total/sum is less than 0.
A small optimization we can make to the recursive approach is to always exclude/skip the current item if it is greater than our running total.
Example: if our running total is 15 and the current item is 20 - there is no reason to try and include it in the current subset, as it will make our running total negative.
A further optimization we can make to our recursive approach is to introduce some memoization.
Now, every time we recur we will store a key:value pair of the results. This means instead of recomputing the same thing over and over, we will instead first check if we have already done the same calculation. If we have, then we will return the memoized computation instead of continuing to recompute.
This saves a decent number of cycles. In my limited testing with an input array A = [2, 4, 6, 10, 16, 8, 1, 5, 7, 3, 44] and S = 16, the memoized recursive version did almost 50% less work than the non-memoized version. Of course, this will change based on the inputs - but it highlights how much more efficient a small change can be.
When solving this problem using recursion and memoization it is known as a top-down approach. Top-down means we start with the larger problem, then break it down into sub problems, ensuring we remember answers to our sub problems along the way (in case the same answer is needed again).
A final approach to solving this problem is using dynamic programming and a bottom-up approach. Bottom-up means we start by solving the smallest sub-problem first, then work our way up to the actual problem itself.
A common way to perform bottom-up solutions is using a 2D array as a matrix.
It is common to use a boolean array, or similar.
Our matrix will represent a table of our sub-problems and their solutions.
Dynamic programming is actually an entire other topic I was planning to cover in a later post - but I accidentally got some exposure to it with this problem.
To try to explain it in words is to do the solution injustice (and would make this post even longer), so I encourage you to instead check out the video in the resources section that explains more about tabulating the sub-problems. And if you are really curious about dynamic programming this is a gentle introduction.
The bottom-up has a space complexity of O(n * Sum). Both the top-down/bottom-up solutions have a run time of O(n * Sum).
Because our bottom-up approach doesn't have the recursion overhead it's typically more efficient in real terms.
Check out FindAllSubsetsWithSum(), CountSumSubsets(), CountSumSubsets2() at the end of this post for the implementations of recursion, recursion with memoization, and bottom-up dynamic programming (respectively).

Bonus

SortedDictionary: This is part of the C# standard library. Per Microsoft, it is a binary search tree with O(log n) retrieval. However, as binary search trees aren't typically balanced, there are some questions on exactly what tree data structure it actually uses. It would appear it could actually be using something closer to a Red-Black tree.
Memoization: Memoization is sort of like caching. We remember the results of a computation so we don't have to recompute the same thing over and over. We instead check the "cache" and see if we have already done the work previously. Here's a good article on memoization.
Dynamic Programming: Dynamic programming is commonly bottom-up, and is a sort of table-filling algorithm (using a matrix as our table). We fill up our table with solutions to sub-problems, and then use the resulting table to solve the problem at large. Here is a good article on bottom-up. And here is a good article on dynamic programming (which also compares top-down with bottom-up).

Below you will find implementations for all of the problems discussed:

As always, if you found any errors in this post please let me know!

DEV Community

Permutations, Combinations, & Subsets

Sets, Subsets, & Combinations

Permutations

Finding all permutations of a string

Finding the next permutation of an integer array

Finding all combinations/subsets of a string/integer array

Finding all the letter combinations of a phone number

Word subsets (finding all words that contain a subset of letters)

Subset sum (Find/count all subsets with a given sum)

Bonus

Top comments (0)