Advent of Code - Day 1

#sql #beginners #programming #adventofcode

The first challenge of the Advent of Code 2022 is out, and this year I decided to try to solve the proposed challenges only using T-SQL. I also want to share the solutions here, as I think they will provide a great learning experience for those who are new to SQL, and get a sense of how powerful it is. Here's the solution to the first one (I'll try to keep up with the challenges every day, but I really can't promise anything as it really depends on how much free time I'll have after work and family needs...)

The problem is about elves (of course!), food and calories. You start with a list of calories that elves are bringing with them. The list you are given as the starting point of this challenge has empty lines to separate the calories of each elf in their own inventory. Here's an example, simplified and moved to a spreadsheet for easier understanding:

Let's start importing the calories data. I just pasted the content from the Advent of Code website and the used the STRING_SPLIT function to turn the string into a table and then move the data into the final table I'll be using for this challenge:

drop table if exists dbo.ch01_input;
create table dbo.ch01_input
(
    id int identity not null primary key,
    calories int null
)
go

declare @calories nvarchar(max) = '
3264
4043

...<content from the Advent of Code file here>...

6438
1020';

insert into 
    dbo.ch01_input 
select 
    cast(nullif(replace([value], char(13), ''), '') as int) as calories 
from 
    string_split(@calories, char(10))
go

As you can see, I'm also converting the empty lines into NULL values, so that they can fit into a INT data type as calories values are just numbers. Choosing the right data type is good for data hygiene and to avoid expensive cast operations in future. Strings requires quite a lot of CPU power compared to numbers, and on the cloud, since your pay for what you use, you really want to optimize performances to reduce costs. Not that in this case it would matter as there are just a few thousand rows, but it is nonetheless a good habit to have. I do not want to overengineer, but just some common sense is always good to apply.

Part 1

The first challenge is to "Find the Elf carrying the most Calories. How many total Calories is that Elf carrying?"

That is easy! From the spreadsheet you can see that each elf has a nice set of values, so the problem can be easily solved if we could find a way to easily identify and work on these sets of values. In fact, as per challenge description: "Each Elf separates their own inventory from the previous Elf's inventory (if any) by a blank line". Too bad there are two thousand values so we can't do it visually. The first solution that can come to mind is to just start from the first row and process all of them in sequence, assigning the calories values to a new elf whenever a white line is found. That would work, but it just feels like a "brute force approach". We can do better: we can be smart.

We can use some simple math to identify all the sets of values in the provided list. All that is needed is to give each row a sequential number based on its ordinal position in the file, including the blank lines, and another sequential number excluding, this second time, the blank rows. Here's an example using the simplified spreadsheet representation:

N1 and N2 represent the ordinals given to each element, based on its order, as mentioned before.

Now, if you subtract the values in N2 from the values in N1, you'll get a group identifier:

Isn't that cool! In T-SQL this can be done easily, thanks to the row_number() function:

drop table if exists #part1;
select 
    group_id = id - row_number() over (order by id),
    * 
into
    #part1
from 
    dbo.ch01_input
where
    calories is not null

Now that #part1 contains the data with also the group identifier, a group by can give the answer:

select top(1)
    group_id, sum(calories) as totcalories
from
    #part1
group by
    group_id
order by
    totcalories desc

Part 2

Once the first part of the challenge is done, you'll have access to the second part. The question this time is: "Find the top three Elves carrying the most Calories. How many Calories are those Elves carrying in total?"

Again, pretty easy now that we have a group identifier. We just must take the top three and sum them up:

with cte as
(
    select top(3)
        group_id, sum(calories) as totcalories
    from
        #part1
    group by
        group_id
    order by
        totcalories desc
)
select sum(totcalories) as totaltop3 from cte

Done!

Gaps and Islands

The technique used to solve this problem is well known and my friend and SQL Guru Itzik Ben-Gan has some great articles on it and how it can be used to solve complex problems. Here's some references for you do dig into it: