First before anything else: ⚠️ SPOILER ALERT! ⚠️ This blog post will discuss possible solutions to the first puzzle of Advent of Code. If you haven't completed it yet, and want to do it before looking into any solutions, don't read this blog post. Also if you're just a beginner in coding and you need to spend a lot of time figuring out how to solve the puzzle, this might get over your head and that's fine. I want to say that it's totally fine not to optimize your solution, since Advent of Code doesn't have any efficiency criteria for solutions.
This is my first year to participate Advent of Code, and I find it very enjoyable. I have a degree in CS, but since I graduated in 2018, I haven't done any coding puzzles. I solve the puzzles with some brute-force method first, but then I find myself thinking about if there's a more efficient solution. We have a group of people in our community that solve the puzzles, and when I mentioned the time complexity of my solution, someone said they had never thought analyzing the time complexity of their code. That's why I thought I'd write a blog post about big O notation and how to analyze your solution to understand it better. I use the first puzzle of Advent of Code to demonstrate how to analyze the code.
In the first part of the puzzle the assignment is the following:
Before you leave, the Elves in accounting just need you to fix your expense report (your puzzle input); apparently, something isn't quite adding up.
Specifically, they need you to find the two entries that sum to 2020 and then multiply those two numbers together.
The second part is the same, but with three numbers, that need to sum to 2020. I focus on the first part only, because I didn't optimize the solution of the second part, and also because it would make this blog post even longer. I used Python to solve the puzzle and will use that in the examples, too.
So big O notation is used for indicating the time complexity of an algorithm, and more specifically the worst case. It means that for example if you have a list of numbers, and you're searching for number 2 in the list, the worst case is that it's not in the list or it's the last element, and you have to go through the whole list. The time complexity is then O(n), where n is the number of elements in the list.
Time complexity shouldn't be thought of as the actual duration of the algorithm execution, but rather as how fast the duration grows in relation to input size. There are multiple classes that are typically used. I've listed some of them in the table and will show examples of them later.
|O(1)||Constant time complexity|
|O(log n)||Logarithmic time complexity|
|O(n)||Linear time complexity|
|O(n log n)||Log-linear time complexity|
|O(n^2)||Quadratic time complexity|
|O(n^c)||Polynomial time complexity|
So the easiest way to solve the puzzle is a brute force solution. As I said earlier, Advent of Code doesn't have any requirements for how efficient your solution is, it just wants an answer, so brute force is a completely valid approach to the problems. I first used the brute force solution to get the answer, and then later optimized my solution, so my first version of the code looked like this:
def find_two_numbers_that_sum(desired_sum, list_of_expenses): for expense in list_of_expenses: # Iterating over a list: O(n) for expense_to_compare in list_of_expenses: # Iterating over a list: O(n) if expense + expense_to_compare == 2020: # Comparison: O(1) return expense * expense_to_compare # Multiplication: O(1)
This is a very simple solution, and it works. The size of the list is 200, so it's not too slow to be solved like this. Let's get to the analysis of that solution then.
As I wrote as a comment to the code, iterating over a list is an O(n) solution. Because there are two loops, and the second one is inside the first one, they are multiplied. We multiply also the comparison complexity and the multiplication complexity, but since they're both constant, they don't affect the complexity of the solution. Thus the complexity of the solution is n*n = O(n^2). You can test this with a small list (for example of size 5) and calculate how many iterations it takes with that function. Remember to calculate the worst case, which is in this example that there are no two numbers that sum to 2020, and you go over the whole list.
But since there's an early exit if the two numbers are found, wouldn't it mean that it's better than O(n^2)? Unfortunately not. Because big O notation is about the worst case, and the worst case is that either there are no such numbers, or they are the last pair to compare, the time complexity is still the same.
How about if we never look into the same combination once? Wouldn't it make the solution faster? Well in practice the solution is faster, but the growth of the duration still remains on the same scale. If you get a pen and paper (which I suggest you have always when you try to solve a puzzle) and simulate the algorithm you notice that if you have a list of five numbers, the number of elements you check is 5 + 4 + 3 + 2 + 1 = 15. When there's 10 elements, the number of elements you check is 10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = 55. The time complexity is roughly O((n^2 + n)/2), but because we ignore the constants, we have only O(n^2 + n) which is essentially the same as O(n^2). In the picture you see a similar growth in the number of iterations when the size of the list grows:
So you see that although the number of iterations is lower, the growth is as fast as with the simple version.
So after I had given the correct answer to Advent of Code and got a confirmation that I'd solved it, I started to think about how to optimize it. I came up with an idea that maybe I could sort the list of numbers and then use a binary search to find the number we were looking for. Python has a sorting function as built-in, so I didn't need to implement that myself, but I implemented binary search myself. I first implemented the iterative version of it but later refactored it to be recursive. Here's the code:
import math def recursive_binary_search(sublist, first_elem, desired_sum): if len(sublist) == 0: return beginning = 0 end = len(sublist) - 1 selected_elem = math.floor((end - beginning)/2) elem_sum = first_elem + sublist[selected_elem] if elem_sum == desired_sum: return first_elem * sublist[selected_elem] if elem_sum > desired_sum: end = selected_elem - 1 if elem_sum < desired_sum: beginning = selected_elem + 1 recursive_binary_search(sublist[beginning:end], first_elem, desired_sum) def find_subset_sum(expenses_list, desired_sum): sortedd = sorted(expenses_list) # Python built in sort algorithm, O(n log n) for elem in sortedd: # Iterating over the elements, O(n) binary_search_list = sortedd.copy() beginning=0 end = len(binary_search_list)-1 # Binary search, O(log n) return_val=recursive_binary_search(binary_search_list, elem, desired_sum) if return_val: print(return_val) return
Now recursion makes the analysis of the algorithm a bit more complex, but as binary search is a commonly known algorithm, we know its time complexity is O(log n). If you are familiar with logarithm, it's easier to understand, but basically with each iteration the size of the list that we need to go through halves, so in total, the number of iterations is log n.
What is the time complexity of the solution? First, we sort the elements, and it has O(n log n) complexity. Then we iterate over the elements, with time complexity O(n), and inside the loop, we perform the binary search. Let's start with calculating the complete time complexity of the loop. Because binary search is inside the loop, we multiply O(n) with O(log n). It results in O(n log n). Because sorting is not done inside the loop, we can just sum O(n log n) + O(n log n). The total complexity is thus 2 O(n log n), which is essentially the same as O(n log n). Thus the time complexity for the solution is O(n log n). It's good to note that this solution only works when there are two numbers that we need to find, so it doesn't help with part 2 of the puzzle.
After making the improvements described above, I went to see how other people had solved it. On Twitter, I saw some vague tweet about using a set data structure, but couldn't wrap my head around it. Then I came here to see if someone had posted their solution and found Christopher's post. The solution in the blog post is simple and elegant, and it was more efficient than the sort + binary search solution. You can check the blog post but I also explain it shortly here.
- Iterate over the elements
- Subtract the value of the element from 2020 to get the number you want to find
- Insert all other elements into a set
- Check if the set has the number your looking for, return the result of the multiplication
So this solution is very smart because it uses a set as a data structure. Inserting a new element, removing an existing one, and finding a value in a set has constant O(1) time complexity. This means that only step 1 has larger time complexity, O(n), and all the other steps have time complexity O(1). When we multiply all of these together, we get O(n).
So when you analyze your solutions for Advent of Code or some other puzzle (or even your work!), the most important thing is to remember that big O notation doesn't describe duration, it describes the growth of the number of iterations in relation to the size of an input. In practice, some optimizations might make your code faster, but the growth in relation to the input size doesn't change. When you calculate the complexity, you can define complexity for each row and then either sum them or multiply them, depending on whether they happen in a row or a nested manner.
If you want to compare your solutions with others, comparing time complexity instead of duration might be a good idea if it doesn't feel too heavy. It's because when you measure duration, the context is important. The duration depends on for example your computer, if you have to same data, if the data is in the same order, the programming language you're using, etc. If you haven't analyzed your solutions and their time complexity before, I think Advent of Code provides nice puzzles to start with!