Parsing Data in Rust with Nom

#python #adventofcode #parsing

This is my third year participating in Advent of Code, but the first using Rust! Since I’m new to the Rust ecosystem, I’ve been dependent on others to steer my third-party library selections. As an example, Day 15 (like most days) presented some interesting string parsing requirements. Luckily, I was guided toward an excellent parser combinator library, affectionately named nom, via Chris Biscardi¹.

Beacon exclusion zone

The Day 15 challenge requires you to track sensors, beacons, and their coordinates. The raw input for this looks like:

Sensor at x=2, y=18: closest beacon is at x=-2, y=15
Sensor at x=9, y=16: closest beacon is at x=10, y=16
Sensor at x=13, y=2: closest beacon is at x=15, y=3
Sensor at x=12, y=14: closest beacon is at x=10, y=16
Sensor at x=10, y=20: closest beacon is at x=10, y=16
Sensor at x=14, y=17: closest beacon is at x=10, y=16
Sensor at x=8, y=7: closest beacon is at x=2, y=10
Sensor at x=2, y=0: closest beacon is at x=2, y=10
Sensor at x=0, y=11: closest beacon is at x=2, y=10
Sensor at x=20, y=14: closest beacon is at x=25, y=17
Sensor at x=17, y=20: closest beacon is at x=21, y=22
Sensor at x=16, y=7: closest beacon is at x=15, y=3
Sensor at x=14, y=3: closest beacon is at x=15, y=3
Sensor at x=20, y=1: closest beacon is at x=15, y=3

While this text is parsable with regular expressions, or a combination of well-placed string splits, using a parsing library helps break things down in a structured way (which can sometimes be beneficial for part 2 challenges).

Presuming we have structs for Sensor and Beacon that look like the ones below, we can start building out the parsing logic.

struct Sensor {
    x: i64,
    y: i64,
}

struct Beacon {
    x: i64,
    y: i64,
}

Parsing with Nom

First, we’ll parse out each line of input, along with the part of the line relevant to either a Sensor or a Beason. Second, we’ll parse out the coordinates and populate them into instances of Sensor and Beacon.

For the first part, everything is contained in a function that takes the raw input as a string slice (&str) and returns an IResult. An IResult is a container for the result of a nom parsing function. The string slice component of an IResult is the remaining unparsed input, and the Vec(Sensor, Beacon) is our expected parsing result.

fn map(input: &str) -> IResult<&str, Vec<(Sensor, Beacon)>> {
    let (input, reports) = separated_list1(
        line_ending,
        preceded(
            tag("Sensor at "),
            separated_pair(
                position.map(|(x, y)| Sensor { x, y }),
                tag(": closest beacon is at "),
                position.map(|(x, y)| Beacon { x, y }),
            ),
        ),
    )(input)?;

    Ok((input, reports))
}

Inside the map function, we start off with separated_list1, which helps us break up the input into lines. The first argument is line_ending, which matches line endings of both the \n and \r\n variety. The second argument starts with preceded, which isolates everything after the Sensor at tag in the line and supplies it to separated_pair. separated_pair in turn helps parse out what is on either side of the : closest beacon is at tag. In this case, those are the coordinate pairs for Sensor and Beacon, respectively. To parse them, we’ll define another function called position.

The position function helps extract the values of coordinate pairs. As you can see, it has similar arguments to map, and an IResult return value. However, the types in the IResult are a bit different here. The second argument is a tuple, for the x and y coordinates, both i64.

fn position(input: &str) -> IResult<&str, (i64, i64)> {
    separated_pair(
        preceded(tag("x="), complete::i64),
        tag(", "),
        preceded(tag("y="), complete::i64),
    )(input)
}

Right away, we jump into separated_pair again. This parses out both sides of the ,, while preceded isolates the value after either x= or y=. The second argument of preceded is another parsing function—a character::complete::i64, which matches the coordinate integer value.

Coming back to the map function, we (somewhat confusingly) call map on the position parsing result to get the parsed values. That allows us to destructure the tuple and use the values to construct the Sensor and Beacon struct literals.

Now, if we use the dbg! macro on the result of a call to map with test input, we should see something like:

map = [
    (
        Sensor {
            x: 2,
            y: 18,
        },
        Beacon {
            x: -2,
            y: 15,
        },
    ),
    (
        Sensor {
            x: 9,
            y: 16,
        },
        Beacon {
            x: 10,
            y: 16,
        },
    ),

// . . .

]

Look at that beautifully structured data!

Conclusion

Reasonably painless, and well-structured—that’s parsing data with Rust and Nom! If you’re interested in taking a closer look at Nom, I encourage you to review this handy list of its available parsers and combinators.

I highly recommend checking out Chris’ phenomenal Advent of Code solution videos. I could not have dreamt of a better resource to get up-to-speed quickly, with Rust. ↩

DEV Community

Parsing Data in Rust with Nom

Beacon exclusion zone

Parsing with Nom

Conclusion

Top comments (0)

Read next

Historian Hysteria

Remaking a rule-engine DSL

Batch, Mini-Batch & Stochastic Gradient Descent

Beginner's Guide to Python: A Quick Tutorial - 2