## DEV Community

Georgios Kampitakis

Posted on • Updated on

# Advent of Code: Investigating performance improvements in Go

I often enjoy trying to solve previous years' Advent of Code puzzles. If you don't know what Advent of Code is make sure you check it out.

TLDR: Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like.

This post is about puzzle 2021/day-6 I was trying to solve that's not very interesting as a puzzle but I found it to be a really good example for exploring how to deep dive into your code's performance, find potential bottlenecks and improve it with tools that Go provides.

Each puzzle on Advent of Code contains two parts. You need to solve the 1st to move to the 2nd one and usually the 2nd is a more complicated or demanding variation of the 1st. This is the approach we are going to follow on this post as well.

## Tools

The tools we are going to use are

• Benchmarks for measuring and analyzing the performance of go code with help from go tool pprof.
• Benchstat computes statistical summaries and compares benchmarks.

## Part 1

I am not going to repeat the problem here for brevity, but if you read the puzzle carefully you can probably come up with a trivial solution as I did the 1st time.

``````func parseInput(s string) []int {
data := strings.Split(strings.Split(s, "\n")[0], ",")
fish := make([]int, len(data))

for i, d := range data {
fish[i] = MustAtoi(d)
}

return fish
}

func MustAtoi(s string) int {
n, err := strconv.Atoi(s)
if err != nil {
panic(err)
}
return n
}

func Solve(input string, days int) int {
fish := parseInput(input)

for i := 0; i < days; i++ {
tempFish := []int{}

for j := range fish {
if fish[j] == 0 {
fish[j] = 6
tempFish = append(tempFish, 8)
} else {
fish[j]--
}
}

fish = append(fish, tempFish...)
}

return len(fish)
}
``````

The approach for solving it is:

• parsing the input and turning it into a list of numbers and then iterating for the number of `days`
• for each day I decrease each fish's timer, if the timer reaches zero resets to `6`
• and on a temporary list add a new fish with timer `8` that at the end of the day is going to be appended on the main list.

That's pretty much it, successfully solves the Part 1 and it's fast enough.

``````ok advent-of-code/2021/go/day-6 0.226s
``````

The purpose of this post though is not to just solve the puzzle.

Before updating the code we need a benchmark of the current implementation as a base

Note: Always measure before optimizing.

``````var sink int

func BenchmarkSolution(b *testing.B) {
res := 0

b.ResetTimer()
b.ReportAllocs()

for i := 0; i < b.N; i++ {
res = Solve(input, 80)
}

sink = res
}
``````

### Benchmarks

We can run the benchmark with

``````go test ./... -benchmem -run=^\$ -v -bench ^Bench -memprofile naive.out -count=5 | tee naive.txt
``````

Which will produce output similar to the following one:

``````pkg: advent-of-code/2021/go/day-6
BenchmarkSolution
BenchmarkSolution-10                 235           5210983 ns/op        31653556 B/op       1025 allocs/op
BenchmarkSolution-10                 231           5086091 ns/op        31653547 B/op       1025 allocs/op
BenchmarkSolution-10                 225           5146135 ns/op        31653549 B/op       1025 allocs/op
BenchmarkSolution-10                 231           5191141 ns/op        31653571 B/op       1025 allocs/op
BenchmarkSolution-10                 229           5104518 ns/op        31653558 B/op       1025 allocs/op
PASS
``````

Note: the flag `-memprofile` memory profiling is enabled which can help to identify memory-related bottlenecks.

You can use the interactive CLI interface for navigating the profiling data

``````go tool pprof naive.out
# and then list the data for Solve function with
list Solve
``````

From this view we can dig into profiling data collected from the benchmark and it's easy to spot the 3 big offenders that allocate memory.

Reducing unnecessary memory allocations leads to improved performance (e.g memory allocation overhead, garbage collection cache friendly behaviour).

Normally while investigating performance issues, you run the benchmarks against every code change to verify your changes and that you actually improving your code, but for brevity we are going to discuss all the improvements and then run the benchmark to see the final result.

### Improvements

The first offender is `parseInput` which is allocating an N sized integer slice, where N is the input of the puzzle. Instead of slice of integers `[]int` we can use a slice of bytes `[]byte` so we can store the same information with less memory
(and as a bonus we skip parsing from string to integer). We can do this due to the puzzle's constraints, input numbers are from `0` to `8`.

The second offender can be seen if we do `list parseInput`

`data := strings.Split(strings.Split(s, "\n")[0], ",")` is as well allocating space that we throw right after. Instead, it's better parsing char by char.

Here is the updated `parseInput` function:

``````func parseInput(s string) []byte {
fish := make([]byte, 0, len(s)/2)

for i := 0; i < len(s); i++ {
if s[i] == '\n' {
break
}
if s[i] == ',' {
continue
}
fish = append(fish, s[i])
}

return fish
}
``````

The third offender is the `tempFish` slice. On every iteration, we are populating the slice only to throw it on the next step and create a new one and we are keep reallocating memory. We can reuse the same slice across iterations by just clearing the slice instead of creating a new one.

The updated code for the `Solve` function

``````func Solve(input string, days int) int {
fish := parseInput(input)
tempFish := []byte{}

for i := 0; i < days; i++ {
for j := range fish {
if fish[j] == '0' {
fish[j] = '6'
tempFish = append(tempFish, '8')
} else {
fish[j]--
}
}

fish = append(fish, tempFish...)
tempFish = tempFish[:0]
}

return len(fish)
}
``````

### Results

Rerunning the benchmarks

`````` go test . -benchmem -run=^\$ -v -bench ^Bench -count=5 -memprofile improved.out | tee improved.txt
``````

generates the following, much improved, output

``````pkg: advent-of-code/2021/go/day-6
BenchmarkSolution
BenchmarkSolution-10                 388           2945361 ns/op         2193282 B/op         45 allocs/op
BenchmarkSolution-10                 410           2950991 ns/op         2193280 B/op         45 allocs/op
BenchmarkSolution-10                 409           2917432 ns/op         2193278 B/op         45 allocs/op
BenchmarkSolution-10                 404           2968872 ns/op         2193277 B/op         45 allocs/op
BenchmarkSolution-10                 409           2917535 ns/op         2193281 B/op         45 allocs/op
PASS
``````

the benchmark is already looking better but `benchstat` can help to answer "by how much"

``````benchstat naive.txt improved.txt
name         old time/op    new time/op    delta
Solution-10    5.20ms ± 7%    2.94ms ± 1%  -43.51%  (p=0.008 n=5+5)

name         old alloc/op   new alloc/op   delta
Solution-10    31.7MB ± 0%     2.2MB ± 0%  -93.07%  (p=0.008 n=5+5)

name         old allocs/op  new allocs/op  delta
Solution-10     1.02k ± 0%     0.04k ± 0%  -95.61%  (p=0.008 n=5+5)
``````

So the new solution is 43.51% faster than the old one and requires 93.07% less memory to do the same job.

## Part 2

For the second part, the puzzle is running for 256 days. Even with the previous improvements though, at least on my machine, I can't get a result (after waiting for 5 minutes I gave up).

It's time to think the solution a bit more carefully and use a more efficient algorithm, as the solution is `O(days*fish)` and the number of fish increases on every iteration. We need to find a way to keep the number of fish constant.

An example input of the puzzle

``````3,4,3,1,2,2,1,1 ....
``````

We know that a fish can have a timer from `0` to `8` and notice all fish that have the same timer are "synchronized".This means we can group all the fish that have the same timer and change the `O(days*fish)` to `O(days*8)` which is equal to `O(days)`.

We can achieve this by using a map, where the keys are the timer values, hence `0`-`8`, and the values are the number of fish that have this as their timer. That would turn the above input to the following structure

``````{
0: 0,
1: 3,
2: 2,
3: 2,
4: 1,
5: 0,
6: 0,
7: 0,
8: 0
}
``````

and the size of the map will be constant regardless of the number of fish. Every day we are increasing the timers per group instead of one fish at a time.

We are keeping the findings from Part 1 and the new adjusted code

``````func Solve(input string, days int) int {
fish := parseInput(input)
tmpFish := make(map[uint8]int, 9)

for i := 0; i < days; i++ {
resetCount := 0

for k, v := range fish {
if k == 0 {
resetCount += v
} else {
tmpFish[k-1] = v
}
}

tmpFish[8] += resetCount
tmpFish[6] += resetCount

fish, tmpFish = tmpFish, fish
clear(tmpFish)
}

counter := 0
for _, v := range fish {
counter += v
}

return counter
}

func parseInput(s string) map[uint8]int {
fish := make(map[uint8]int, 9)

for i := 0; i < len(s); i++ {
if s[i] == '\n' {
break
}
if s[i] == ',' {
continue
}
fish[s[i]-'0']++
}

return fish
}
``````

### Results

This solution returns a result - if you recall 5 minutes were not enough to get a result previously - running the benchmark against it produces the following output:

``````pkg: advent-of-code/2021/go/day-6
BenchmarkSolution
BenchmarkSolution-10               34027             33479 ns/op            2373 B/op         85 allocs/op
BenchmarkSolution-10               35720             33435 ns/op            2374 B/op         85 allocs/op
BenchmarkSolution-10               35521             33486 ns/op            2374 B/op         85 allocs/op
BenchmarkSolution-10               35900             33392 ns/op            2373 B/op         85 allocs/op
BenchmarkSolution-10               35878             33553 ns/op            2373 B/op         85 allocs/op
PASS
``````

Running the `benchstat` with the previous solution reveals even more impressive results than before improving the performance of the code up to 98.86% and using 99.89% less memory.

``````benchstat improved.txt new_algo.txt
name         old time/op    new time/op    delta
Solution-10    2.94ms ± 1%    0.03ms ± 0%  -98.86%  (p=0.008 n=5+5)

name         old alloc/op   new alloc/op   delta
Solution-10    2.19MB ± 0%    0.00MB ± 0%  -99.89%  (p=0.008 n=5+5)

name         old allocs/op  new allocs/op  delta
Solution-10      45.0 ± 0%      85.0 ± 0%  +88.89%  (p=0.008 n=5+5)
``````

## Bonus point

What if I told you there is one more one-line improvement that could give an increase in the code's performance by another 95.90% and reach zero memory allocations?

``````benchstat  new_algo.txt new_algo_array.txt
name         old time/op    new time/op    delta
Solution-10    33.5µs ± 0%     1.4µs ± 2%   -95.90%  (p=0.008 n=5+5)

name         old alloc/op   new alloc/op   delta
Solution-10    2.37kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)

name         old allocs/op  new allocs/op  delta
Solution-10      85.0 ± 0%       0.0       -100.00%  (p=0.008 n=5+5)
``````

You can try changing the data structure from `map[uint8]int` to `[]int`, the rest of the code will still work. Why is that you might be wondering or you might have already guessed but I leave it as an exercise, use the profiling data and the answer is there.

hint: You can run benchmarks with `-cpuprofile` instead of `-memprofile` and check the top entries with `go tool pprof`