Yuan Gao

Posted on Dec 5, 2020 • Edited on Dec 22, 2020

Advent of Code 2020: Day 03 using numpy and vectorized calculations

#adventofcode #python

In which we use more of that tasty numpy vectorized calculations to make quick work of the math.

Things mentioned in this post: numpy, vectorized calculations

The Challenge Part 1

Link to challenge on Advent of Code 2020 website

The challenge involves traversing a grid that is either empty, or contains a tree, using predefined movement rules through this grid. There is some information about how the grid of trees wraps as well.

Ultimately what this comes down to is figuring out what grid squares will be traversed, and counting the trees in it.

The naïve approach

The naive approach would be something along these lines:

Read the data in
Recreate a grid of the data to mimic the structure that it was in
Iteratively move a virtual tobogganist through each of its position in the grid, generating a list of 2D coordinates
Checking this list of 2D coordinates in the grid of data to see how many of them contain trees.

Or in other words: load the data, figure out what data to check, check the data.

I won't actually implement this one, but jump into the meaty stuff.

Optimizing the math

The thing is, we don't actually need to generate the 2D coordinates. If we loaded all the data into a 1D array rather than a 2D one, we would simply calculate the correct 1D offset in the data and check if the tree was in there. Think of it is taking our Christmas woollen sweater, and pulling on the string and unravelling the formerly 2D (or 3D I guess) sweater into a single 1D string. The interesting thing is that the design of that sweater is still on that 1D string. And if we knew the dimensions of all the parts of the sweater accurately enough, we could figure out exactly where each part of the pattern on the original sweater would end up on the 1D string. What a great way to spend Christmas!

The second thing is that we don't need to iteratively move the tobogganist. We know the simple geometric rules by which the movement works, so we could generate this list of (now 1D) coordinates directly using math!

So let's get started. First, let's import the data in as binary, using numpy:

data = np.fromfile("input.txt", dtype='uint8')

Output

array([46, 46, 35, 46, 35, 46, 46, 46, 35, 46, 35, 46, 35, 46, 35, 35, 46, ...

As we can see, this is the ascii representation of the trees. the number 46 is the "." character, while the number 35 is the "#" character. For convenience, we can get 35 by doing ord("#").

Next, let's generate all the vertical index numbers for each movement of the toboggan (323 is the height of my data)

index = np.arange(323)

Output

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12, ...

Imagine these numbers are the vertical line that the tobogannist is on. When the index is 0, the tobogannist is in the top of the course. When the index is 1, the tobogannist has moved down one line

To calculate where the tobogannist is in the horizontal direction, since we know the tobogannist has moved by 3 across for every row, we can simply multiply our index by 3 to get the x position:

x_pos = index * 3

Output

array([  0,   3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36, ...

However, the width of the data is only 31 wide, how can we be at a position of 36? The challenge rules say that the trees actually repeat ad infinitum to the right. While we could double up the data, we can achieve the same effect by just wrapping the tobogannist around to the other side. We can achieve this using the modulo (%) operator (31 is the width of my data)

x_pos = (index * 3) % 31

Output

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30,  2,  5,  8, 11, 14, 17, ...

It can now be seen that the numbers have wrapped around, saving us from having to duplicate the actual data sideways.

Now, we actually now have two arrays, one for the x position in the grid, and one for the y position in the grid. Had our data been 2D, we could now simply look into each of those positions to see if there is a tree. Since our data is not 2D, we need to first map our x/y position into our 1D data space. This is simply

offsets = index * (width+2) + x_pos

Output

array([    0,    36,    72,   108,   144,   180,   216,   252,   288, ...

Note the width+2 the extra +2 is to account for the newline characters that are in my data. In some cases you may find only +1 i sneeded.

These numbers are now the indexes of data in our data file that we want to check for trees. We can actually do that all in one go using numpy:

data[idx]

Output

array([46, 46, 46, 46, 46, 46, 35, 46, 35, 35, 35, 35, 46, 35, 35, 35, 46,
       46, 46, 35, 35, 35, 46, 46, 46, 35, 35, 46, 46, 35, 35, 35, 35, 46,
       46, 35, 35, 46, 46, 35, 46, 35, 35, 35, 35, 46, 46, 35, 46, 46, 35,
       46, 46, 35, 35, 46, 46, 46, 35, 35, 46, 46, 46, 46, 35, 35, 35, 35,
       35, 46, 35, 46, 46, 46, 35, 46, 46, 46, 35, 35, 35, 35, 35, 35, 46,
       46, 46, 46, 46, 35, 35, 46, 35, 35, 35, 35, 35, 35, 35, 46, 35, 46,
       46, 35, 35, 35, 35, 46, 35, 46, 35, 35, 46, 35, 35, 35, 46, 46, 35,
       35, 46, 46, 35, 35, 35, 46, 35, 35, 35, 35, 35, 35, 35, 35, 46, 35,
       46, 35, 35, 46, 35, 46, 46, 35, 46, 46, 46, 35, 46, 46, 46, 35, 46,
       46, 46, 35, 35, 46, 46, 46, 46, 35, 35, 46, 35, 35, 46, 46, 46, 46,
       46, 35, 35, 46, 35, 35, 46, 46, 35, 46, 35, 46, 46, 35, 35, 35, 35,
       35, 35, 35, 46, 46, 46, 35, 35, 46, 35, 35, 35, 35, 46, 35, 35, 35,
       35, 35, 46, 35, 46, 35, 46, 35, 35, 35, 35, 35, 35, 46, 46, 35, 35,
       35, 35, 35, 35, 46, 46, 46, 46, 35, 35, 35, 35, 46, 46, 35, 46, 46,
       35, 46, 35, 46, 46, 46, 46, 46, 46, 46, 35, 35, 35, 35, 46, 35, 46,
       46, 46, 35, 46, 46, 35, 46, 35, 35, 35, 35, 35, 46, 46, 46, 35, 35,
       46, 46, 46, 35, 35, 35, 35, 35, 46, 35, 35, 35, 35, 35, 46, 46, 46,
       35, 46, 35, 46, 46, 35, 46, 35, 35, 46, 35, 46, 35, 35, 35, 35, 35,
       35, 46, 46, 46, 46, 35, 35, 35, 46, 35, 35, 46, 46, 35, 46, 35, 35],
      dtype=uint8)

That's it, this is every grid square that the tobogganist encounters. Every number "35" (ascii for the # symbol) is a tree. We just need to count these.

Again, numpy lets us check this with ease (remember, earlier tree was set to ord("#") or 32)

data[idx] == tree

Output

array([False, False, False, False, False, False,  True, False,  True,
        True,  True,  True, False,  True,  True,  True, False, False,
       False,  True,  True,  True, False, False, False,  True,  True,
       False, False,  True,  True,  True,  True, False, False,  True,
        True, False, False,  True, False,  True,  True,  True,  True,
       False, False,  True, False, False,  True, False, False,  True,
        True, False, False, False,  True,  True, False, False, False,
       False,  True,  True,  True,  True,  True, False,  True, False,
       False, False,  True, False, False, False,  True,  True,  True,
        True,  True,  True, False, False, False, False, False,  True,
        True, False,  True,  True,  True,  True,  True,  True,  True,
       False,  True, False, False,  True,  True,  True,  True, False,
        True, False,  True,  True, False,  True,  True,  True, False,
       False,  True,  True, False, False,  True,  True,  True, False,
        True,  True,  True,  True,  True,  True,  True,  True, False,
        True, False,  True,  True, False,  True, False, False,  True,
       False, False, False,  True, False, False, False,  True, False,
       False, False,  True,  True, False, False, False, False,  True,
        True, False,  True,  True, False, False, False, False, False,
        True,  True, False,  True,  True, False, False,  True, False,
        True, False, False,  True,  True,  True,  True,  True,  True,
        True, False, False, False,  True,  True, False,  True,  True,
        True,  True, False,  True,  True,  True,  True,  True, False,
        True, False,  True, False,  True,  True,  True,  True,  True,
        True, False, False,  True,  True,  True,  True,  True,  True,
       False, False, False, False,  True,  True,  True,  True, False,
       False,  True, False, False,  True, False,  True, False, False,
       False, False, False, False, False,  True,  True,  True,  True,
       False,  True, False, False, False,  True, False, False,  True,
       False,  True,  True,  True,  True,  True, False, False, False,
        True,  True, False, False, False,  True,  True,  True,  True,
        True, False,  True,  True,  True,  True,  True, False, False,
       False,  True, False,  True, False, False,  True, False,  True,
        True, False,  True, False,  True,  True,  True,  True,  True,
        True, False, False, False, False,  True,  True,  True, False,
        True,  True, False, False,  True, False,  True,  True])

The output of this is another array but this time containing either True or False depending on whether that grid had a tree or not. We can simply sum these up, again using numpy: np.sum(data[idx] == tree)

So, all together now, the whole code looks like:

import numpy as np
data = np.fromfile("input.txt", dtype='uint8')
across = 3
width = 31
height = 323
tree = ord("#")

index = np.arange(height)
x_pos = (index * across) % width
offsets = index * (width+2)
idx = x_pos + offsets
print("trees", np.sum(data[idx] == tree))

That was it! Again, if we were code golfing, we can compact this even more:

import numpy as np
print("trees", np.sum(np.fromfile("input.txt", dtype='uint8')[(np.arange(323)*3)%31+np.arange(323)*33] == 35))

It's unreadable, but it's compact.

The Challenge Part 2

In part 2, simply instead of having a fixed rule of "1 down, 3 across", there are several different cases with different rules. Because one of the rules goes "2 down", we have to generalize our code a bit more to account for this. Namely, we now need to use a separate value for index and y_pos:

index = np.arange(height/down).astype('uint32')
y_pos = index*down

Here, we divide the number of indices by how many down we go. For example if down=2, then we have half as many indexes to cover. The rest of the code is pretty much the same

import numpy as np
data = np.fromfile("input.txt", dtype='uint8')
rules = [[1, 1],  [3, 1],  [5, 1],  [7, 1],  [1, 2]]

tree = ord("#")
width = 31
height = 323

multiply = 1
for across, down in rules:
    index = np.arange(height/down).astype('uint32')
    y_pos = index * down
    x_pos = (index * across) % width
    idx = x_pos + y_pos * (width+2)
    trees = np.sum(data[idx] == tree)
    multiply *= trees

print("trees", multiply)

Pretty nifty eh?

Onward!

DEV Community

Advent of Code 2020: Day 03 using numpy and vectorized calculations

The Challenge Part 1

The naïve approach

Optimizing the math

The Challenge Part 2

Top comments (0)

Read next

Chrome extensions I use as a developer

Deploy Flask app using docker Compose

Migrar de SQLite a MySQL.

Testing AI-Powered Apps: Introducing LLM Test Mate