Today's Advent of Code challenge is really interesting. Somehow easy, but with a couple of interesting discussion points.
Day 3: Rucksack Reorganization is about helping elves to organize and prioritize their supplies.
I imported the input data as usual, by coping it from the website to my Azure Data Studio query and then using STRING_SPLIT to import each single line - that represents the rucksack content - in its own row:
declare @input nvarchar(max) = 'QLFdFCdlLcVqdvFLnFLSSShZwptfHHhfZZZpSwfmHp rTJRjjbJTgzDJjdsRsfwtfNwtfmZpZNhmmzt ... jGrGqjJfqccrfqGcGplrJpFvzggqmCtMzmsMnvMvvCgm'; drop table if exists dbo.ch03_input; create table ch03_input ( id int identity not null primary key, items varchar(100) collate Latin1_General_BIN2 ); insert into dbo.ch03_input (items) select trim(replace([value], char(13), '')) from string_split(@input, char(10)) go
The items in the rucksacks are identified by a letter and the identifiers are case sensitive. For this reason, I used the
Latin1_General_BIN2 collation that will make sure I adhere to the requirements and get the best performance possible with strings, as explained in Day 2 challenge solution post.
In the first part of the challenge, the rucksack list is split in two, and you must find the item type - represented by its letter - that is in both lists. It is a string comparison problem: which letter of the first list is also in the second list?
The first step is to split the list into two list of the same size:
select *, len(items) as itemcount, left(items, len(items)/2) as comp1, right(items, len(items)/2) as comp2 into #step1 from ch03_input;
I also calculate the length of string as it will come useful in the next step, where I'll split the rucksack string into its letters and store each letter in its own row:
select top(100) row_number() over (order by a.object_id) as n into #n from sys.columns a cross join sys.columns b select *, substring(items, n, 1) as item into #step2 from #step1 s cross join #n n where n.n <= s.itemcount
Splitting a string in its letters is easy if you have a table with numbers, which is what I'm creating as first thing in the query above. Then I use that numbers table to generate a row for each letter in the string, via the
CROSS JOIN and for each row generate extract the
Nth letter of the string. The
WHERE clause uses
itemcount to make sure that I generate exactly one row for each letter in each string, and no more than that.
Then I need to find which item is in both compartments. This means checking if a letter is in a string and that can be done using
select distinct id, items, comp1, comp2, item, charindex(item, comp1, 1) as p1, charindex(item, comp2, 1) as p2 into #step3 from #step2 where charindex(item, comp1, 1) != 0 and charindex(item, comp2, 1) != 0 order by id
The query needs a
DISTINCT as there can be more than one item of the same type in the string, I need just only one per type. The fact that I need to use a
DISTINCT rings some bells (or bring some smell): I'm fairly sure I can refactor my code to do this operation earlier and avoid checking for a letter that has been found already. I'll do this later if I have enough free time. For now, I want to see if my solution works; after that I can optimize it.
Now that the items present in both compartments have been found, I have to assign to each item type the priority value as described in the challenge. Priorities values are based on alphabetical order, so I can use
ASCII to get the letter value and transform it to the corresponding priority value. Priority values are ordered differently than the ASCII order, so a
CASE statement is needed to apply the right transformations:
select item, case when item like '[a-z]' then ASCII(item) - ASCII('a') + 1 when item like '[A-Z]' then ASCII(item) - ASCII('A') + 27 end as priority, id, comp1, comp2 into #priorities from #step3 order by item
And now just summing all the priorities will give the answer:
select sum(priority) from #priorities
Answer is correct, so let's move to the next part of the challenge.
Elves gather in groups of three, and the goal is to find which item type is carried by everyone in the group.
The first step is to create a way to easily group the elves together. By using the existing
id columns and the modulo operator I can find when a new group begins. When the result of the modulo operation is equal to one:
I just need to know when a group starts, so I can set everything not equal to 1 to 0:
Now I can then use a simple and fast running total to generate a
group_id that will allow quick identification of all items in a single group. Amazing, isn't it?
Funny enough my SQL Guru friend Itzik mentioned this technique with the running total when we met yesterday evening. Funny that I would have needed it right the next day. Thanks Itzik!
The final query is the following:
select *, len(items) as itemcount, sum(case when (id % 3) != 1 then 0 else 1 end) over (order by id) as group_id into #step1 from ch03_input;
easy, fast, and elegant!
Once that a way to identify each group is there, the challenge is almost solved. It is just a matter of splitting the strings into their letters, as I did for part one too.
select *, substring(items, n, 1) as item into #step2 from #step1 s cross join #n n where n.n <= s.itemcount
Now I have everything I need to see which letter is present in all three rucksacks. As simple
GROUP BY filtering only those letters that appears exactly three times by using the
HAVING clause will give the answer:
with cte as ( select distinct id, group_id, item from #step2 ) select group_id, item into #step3 from cte group by group_id, item having count(*) = 3 order by group_id
The tricky part here is the
DISTINCT operator in the Common Table Expression. That
DISTINCT makes sure I can differentiate between an item appearing three times in the same rucksack vs an item appearing one item in all three rucksacks. We're interested only in the latter, and not in the first.
Now I just have to apply the same logic to get the priority value used before, calculate the overall total and I'll get the solution to Part 2. Challenge done.
The full solution is available here: yorek/aoc-2022
The technique to deal with islands of data is useful in so many practical uses case that I really recommend you to deep dive into it. Take advantage of the free book chapter on the subject available here: Gaps and islands
An alternative solution could have been a simple JOIN between the three rucksacks in the same group:
select distinct a.id as group_id, a.item as item into #step3 from #step2 a inner join #step2 b on a.id + 1 = b.id and a.item = b.item inner join #step2 c on b.id + 1 = c.id and b.item = c.item where a.id % 3 = 1 order by a.id go
That would work just fine, but it will only work for a group of exactly three elves. While perfectly fitting the requirement, I find the chosen solution gives more flexibility and is way more elegant and future proof. Or agile if you wish. It requires a bit of lateral thinking, which is always a good ability to exercise, so great to have a chance to use it. Given that the resulting query touches the table only once instead of, I also suspect it will also be faster. It is worth digging into it a bit more if you have time.