Advent Of Code 2020 in SQL – Day 6: “Custom Customs”

#adventofcode #sql

The puzzle of Day 6 requests us to group answers to a customs declaration form and analyse them.

Challenge one is the format we need to normalize into something we can work with in SQL. Each answer is a line, but several answers are grouped together and the groups are separated by an empty line.

So as a first step, we introduce a new column that indicates whether the row is a break or not and then use window functions to summarize all breaks from beginning to the current position, which will be our group-ID

base_data as (
    select
      rownum answer_id,
      column_value answers,
      case when column_value is null then 1 else 0 end is_break
    from table(
      aoc_file_loader.file_as_stringlist(
        aoc_file_loader.local_url('day_6_input.txt')
      )
    )
  ),
  grouped_data as (
    select
      answer_id,
      group_id,
      answers
    from (
      select
        answer_id,
        answers,
        is_break,
        sum(is_break) over (
          order by answer_id
          range between unbounded preceding
            and current row)
          group_id
      from base_data
    )
    where is_break = 0
  )
select * from grouped_data

It’s important that we filter away the is_break after we applied the window function!

The next step took me a while – my first attempt was to split the answer-rows with connect by level according to the length of the string, but this solution was terribly slow and took several minutes for only ~20 lines of input-data.

So my second approach was to provide a baseline of all characters a-z as result set we could cross-join against – and this worked pretty well:

with
  answer_pos as (
    select column_value c
    from table(sys.odcivarchar2list(
      'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'
      )
    )
  ),
  base_data as (
    select
      rownum answer_id,
      column_value answers,
      case when column_value is null then 1 else 0 end is_break
    from table(
      aoc_file_loader.file_as_stringlist(
        aoc_file_loader.local_url('day_6_input.txt')
      )
    )
  ),
  grouped_data as (
    select
      answer_id,
      group_id,
      answers
    from (
      select
        answer_id,
        answers,
        is_break,
        sum(is_break) over (
          order by answer_id
          range between unbounded preceding
            and current row)
          group_id
      from base_data
    )
    where is_break = 0
  ),
  distinct_one_answer_per_row as (
    select distinct
      group_id,
      answer_pos.c answer
    from grouped_data
      cross join answer_pos
    where instr(answers, answer_pos.c) > 0
  ),
  distinct_answers as (
    select
      group_id,
      listagg(answer, '') within group (order by answer) distinct_answers
    from distinct_one_answer_per_row
    group by group_id
  ),
  answer_length as (
    select
      group_id,
      length(distinct_answers) answer_length
    from distinct_answers
  )
select
  sum(answer_length)
from answer_length

Part two changes the rules a bit in a way that we should only count the answers that appear on all answered forms per group.

To get to this point we need to compare the number of total answers per group (no matter the answer) and the number of answers per group to a specific answer-position (character). If both match, this specific answer-position appears in all the answers per group.

with
  answer_pos as (
    select column_value c
    from table(sys.odcivarchar2list(
      'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'
      )
    )
  ),
  base_data as (
    select
      rownum answer_id,
      column_value answers,
      case when column_value is null then 1 else 0 end is_break
    from table(
      aoc_file_loader.file_as_stringlist(
        aoc_file_loader.local_url('day_6_input.txt')
      )
    )
  ),
  grouped_data as (
    select
      answer_id,
      group_id,
      answers
    from (
      select
        answer_id,
        answers,
        is_break,
        sum(is_break) over (
          order by answer_id
          range between unbounded preceding
            and current row)
          group_id
      from base_data
    )
    where is_break = 0
  ),
  distinct_one_answer_per_row as (
    select
      group_id,
      answer_pos.c answer,
      count(distinct answer_id) 
        over (partition by group_id) group_answers,
      count(*) 
        over (partition by group_id, answer_pos.c) this_pos_answered
    from grouped_data
      cross join answer_pos
    where instr(answers, answer_pos.c) > 0
  ),
  all_agreed_upon_answers as (
    select distinct
      group_id,
      answer
    from distinct_one_answer_per_row
    where group_answers = this_pos_answered
  ),
  distinct_answers as (
    select
      group_id,
      listagg(answer, '') within group (order by answer) distinct_answers
    from all_agreed_upon_answers
    group by group_id
  ),
  answer_length as (
    select
      group_id,
      length(distinct_answers) answer_length
    from distinct_answers
  )
select
  sum(answer_length)
from answer_length

A bit tricky, but fun and rewarding.

You can find the complete code in the github-repository.

DEV Community

Advent Of Code 2020 in SQL – Day 6: “Custom Customs”

Top comments (0)

Read next

PL/SQL Developer, interview questions

.NET8/EF8 - Exception: Microsoft.Data.SqlClient.SqlException (0x80131904): Incorrect syntax near '$

How to Delete Duplicate Data from a MySQL Database

Common Table Expression (CTE) in PLSQL | Part 1