Advent of Code 2020 - Day 4

In the spirit of the holidays (and programming), I’ll be posting my solutions to the Advent of Code 2020 puzzles here, at least one day after they’re posted (no spoilers!). I’ll be implementing the solutions in R because, well, that’s what I like! What I won’t be doing is posting any of the actual answers, just the reasoning behind them.

Also, as a general convention, whenever the puzzle has downloadable input, I’m saving it in a file named input.txt.

Day 4 - Passport Processing

Find the problem description HERE.

Part One - Take That, TSA!

Part one today puts us in the precarious position of hacking airport security. What fun! It’s for a good cause, though: faking our passport. From a coding perspective, it’s a data parsing problem with a fair bit of string processing thrown in. Given our input of oddly arranged key:value pairs, we need to first identify the text blocks that represent each passport, parse each block into a list of key:value pairs, then extract the key and value from each pair. Oh, we also need to check each set of keys and values to make sure all the keys except cid are present in each set.

library(purrr)
library(stringr)

# Helper function to determine whether a named list represents a 
# valid passport, where the names of the list items represent the
# keys or field lables and the list item values represent the values
# of the fields. Function checks for a value in each field of the named
# list if the key is in the list of `required_fields` keys. If there's a
# value there (not NULL), then it's valid.
is_valid <- function(creds) {
  required_fields <- c('byr', 'iyr', 'eyr', 'hgt', 'hcl', 'ecl', 'pid')
  valid_fields <- map_lgl(required_fields, ~ !is.null(creds[[.x]]))
  all(valid_fields)
}

# Helper function to transform an individual passport text block into
# a named list
parse_passport <- function(cred_str) {
  # Split the text block on any combination of whitespace and newlines, but
  # containing no more than one newline
  cred_list <- unlist(str_split(cred_str, '\\s+\\n?\\s*'))

  # Sometimes the string split can yield empty strings as items. Get rid of
  # those
  cred_list <- cred_list[cred_list != '']

  tags <- map_chr(cred_list, ~ str_extract(.x, '\\w+(?=:)')) # Extract keys
  values <- map(cred_list, ~ str_extract(.x, '(?<=:).*$')) # Extract values
  names(values) <- tags # Name the list
  values # Return named list
}

# Split the input into blocks representing passports where there are two
# newline characters with optional whitespace between. Remove any empty
# strings generated from the output
all_passports <- unlist(str_split(real_input, '\\n\\s*\\n'))
all_passports <- all_passports[all_passports != '']

# Parse all the passport text blocks into named lists
all_passports_parsed <- lapply(all_passports, parse_passport)
valid_passports <- sapply(all_passports_parsed, is_valid) # Check validity
answer <- sum(valid_passports)

There’s probably room for improvement around parsing out the individual key:value pairs from the passport text block and dividing out the text blocks to avoid potentially empty strings in the results in the first place, as opposed to removing them after the fact.

Part Two - Busted?

Looks like the authorities are starting to get wise to our antics. Apparently it matters what’s in those passport fields. Who could have guessed? Probably no one. Now we need to apply better validation rules to each field, and we just so happen to know what those rules are! For this, we just need to modify theis_valid function to check not only for the presence of each field (other than ‘cid’) but also whether each field conforms to its required validation rules.

# Named list of validation rules, where each element is named for a passport
# field and the value is a function that returns TRUE or FALSE depending on
# the validation rules for that field. Note that there's no function for 'cid' 
# because we won't be checking validity for that field.
validation_rule <- list(
  byr = function(x) { replace_na(as.integer(x) >= 1920 & as.integer(x) <= 2002, F) },
  iyr = function(x) { replace_na(as.integer(x) >= 2010 & as.integer(x) <= 2020, F) },
  eyr = function(x) { replace_na(as.integer(x) >= 2020 & as.integer(x) <= 2030, F) },
  hgt = function(x) {
    num <- str_extract(x, '\\d+')
    ifelse(str_detect(x, 'in'), num >= 59 & num <=76, num >= 150 & num <= 193)
  },
  hcl = function(x) { str_detect(x, '^#[0-9a-f]{6}') },
  ecl = function(x) { x %in% c('amb', 'blu', 'brn', 'gry', 'grn', 'hzl', 'oth') },
  pid = function(x) { str_detect(x, '^\\d{9}$') }
)

# Modified `is_valid()`. Now, for each field in required fields, we check to be
# sure there is a value in the `creds` list for that field and we also pass the
# value from the `creds` list to the function with the same name in the
# `validation_rule` list.
is_valid <- function(creds) {
  required_fields <- c('byr', 'iyr', 'eyr', 'hgt', 'hcl', 'ecl', 'pid')
  valid_fields <- map_lgl(
    required_fields,
    ~ ifelse(!is.null(creds[[.x]]), validation_rule[[.x]](creds[[.x]]), F)
  )
  all(valid_fields)
}

# The rest is exactly the same as before. We're still using the 
# `parse_passport()` function from part 1.
all_passports <- unlist(str_split(real_input, '\\n\\s*\\n'))
all_passports <- all_passports[all_passports != '']
all_passports_parsed <- lapply(all_passports, parse_passport)
valid_passports <- sapply(all_passports_parsed, is_valid)
answer <- sum(valid_passports)

Named lists are interesting in that they can behave as dictionaries (or hashmaps) to retrieve items by name, but they can also behave as lists (or arrays) if you ignore the names. Storing the unevaluated validation functions in a named list allows us to programmatically pick which function to use based on the name of the field.

Wrap-Up

Other than my dissatisfaction with the way my string-splitting worked out on the input values, Day 4 went very well! This practice of storing functions in named lists is actually something I have found myself using more often that I would have expected before I realized it was something that was supported in R. I also want to point out the ~ f(.x) anonymous function syntax. That’s R ‘formula’ syntax, and the purrr package allows you to use that syntax for anonymous functions in the map-style (and related) functions. When you use that, .x is just the default variable for the value passed to the anonymous function. I love this syntax and wish it was included in base R. If you found a different solution (or spotted a mistake in one of mine), please drop me a line!