DEV Community

Derp
Derp

Posted on • Updated on

Parser combinators in typescript with parser-ts

Consider the following strings
"13:15"
"1:15pm"
"quarter past 1pm"

These may be just strings to a computer but to a human, we can see that they all represent a time. Furthermore, we see that there is some structure to these strings. So whereas “quarter past 1pm” makes sense as a time. “Foohoogablahblahl” does not.

Alien saying "Foohoogablahblahl"

Now our first instinct might be reach for regex to parse these strings and extract the data out of them, but I'd advise against it. We will solve this problem with parser combinators using the parser-ts library.

Let's begin with the first example "13:15" which is the time in 24 hour format. Upon closer examination, we see that it is comprised of three parts <hours> <:> <minutes> each with its own set of rules.

Taking a look at the <hours> component, we know that there are some rules. First, it must be a number and that number must be between 0 & 23. So to begin with, we reach for the int parser from the parser-ts/lib/string module.

import { run } from "parser-ts/lib/code-frame";
import * as S from "parser-ts/lib/string";

console.log(run(S.int, "14")); //[1] Right 14
console.log(run(S.int, "asdf4")); //[2] Left "Expected: an integer"
console.log(run(S.int, "14aaaaa")); //[3] Right 14
Enter fullscreen mode Exit fullscreen mode

Let's break this down. On line [1], we ran the parser on the string "14" and we got back a Right instance containing the number 14. On line [2], we see that if the int parser is unable to extract out a number, we error out with a Left instance containing a helpful error message. On line [3], we see that we succeed in extracting out the number 14 and we abandon the rest of the string.

Phew, that was a lot, however we aren't done yet because if we try to do

console.log(run(S.int, "29")); // Right 29
Enter fullscreen mode Exit fullscreen mode

We succeed with the number 29 which isn't what we want because there isn't a 29th hour in 24 hour format. To solve this, we reach for our first combinator filter. Filter allows us to pass in a predicate so that we can filter out our invalid values. After this, we finally have our 24 hours parser.

const isValid24Hour = (hours: number) => hours >= 0 && hours <= 23;

const hours24Parser = P.expected(
  pipe(S.int, P.filter(isValid24Hour)),
  "24 hour clock hours must be between 0 & 23"
);

console.log(run(hours24Parser, "14")); // Right 14
console.log(run(hours24Parser, "32")); // Left "24 hour clock hours must be between 0 & 23"
Enter fullscreen mode Exit fullscreen mode

Nice! Next we need our colon parser which matches just the ":" character.

const colonParser = S.string(":");
Enter fullscreen mode Exit fullscreen mode

Finally, we apply a similar approach to the hours24Parser for our minutes parser

const isValidMinute = (minutes: number) => minutes >= 0 && minutes <= 59;

const minutesParser = P.expected(
  pipe(S.int, P.filter(isValidMinute)),
  "Minutes must be between 0 & 59"
);
Enter fullscreen mode Exit fullscreen mode

At this stage, we have our hours24Parser, our colonParserand our minutesParser. So the next challenge is to figure out a way to mash all three of these together. Now Parsers are monads which means that we can chain them together. To make life a bit simpler, we will be using the bind and bindTo convenience methods to store our extracted data.

const time24hoursParser = pipe(
  hours24Parser,
  P.bindTo("hours"), // store hours into an object with key 'hours'
  P.chainFirst(() => colonParser), // ignore and don't store the colon
  P.bind("minutes", () => minutesParser) //store minutes under the key 'minutes'

console.log(run(time24hoursParser, "00:24")); // Right { hours:0, minutes: 24}
);
Enter fullscreen mode Exit fullscreen mode

Excellent! Let's give this hours, minutes data object a type and call it a TimeObj

type TimeObj = {
  hours: number;
  minutes: number;
};
Enter fullscreen mode Exit fullscreen mode

Okay, let's take a breather now. Imagine you are an international amazing race participant and you have just woken up on a bus after three consecutive connecting flights. The bus is currently traveling in a tunnel and you have no idea what year it is let alone the month, day or time. Spying a fellow passenger with a wristwatch, you groggily come to and have the following conversation:

You: Hello fellow human. I appear to have regained consciousness after my intraplanetary travels. May I inquire the reading on your portal time tracking device?
Passenger: ooookay....
Passenger: <looks down at watch>
Passenger: uhhh, it is currently quarter past 1
You: May I ascertain the position of the sun relative to our position as well?
Passenger: huh? ohh
Passenger: quarter past 1...pm

Alien asking for time pretending to be human

So how do we parse the string "quarter past 1pm"? To do this, we will need to break down the structure of that string into a grammar.

<long-form-time> ::= <segment> <relative> <hours12><ampm>
<segment> ::= quarter | half
<relative> ::= past | to
<hours12> ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12
<ampm> ::= am | pm
Enter fullscreen mode Exit fullscreen mode

Within our grammar, we allow sentences like "quarter past 1pm", "quarter to 5pm" and "half past 12pm".

So let's get into it. The first thing to tackle would be the string "quarter" and "half".

const quarterParser = pipe(
  S.string("quarter"),
  P.map(() => 15)
);
const halfParser = pipe(
  S.string("half"),
  P.map(() => 30)
);

const segmentParser = P.either(quarterParser, () => halfParser);

console.log(run(segmentParser, "half")); // Right 30
Enter fullscreen mode Exit fullscreen mode

We have a quarterParser to match on the string "quarter" and return back the number 15. We have a halfParser to match on the string "half" and return back the number 30. We then use the either combinator to try one and then the other if the first one fails.

Next up will be tackling the <relative> segment of our grammar. This one requires a bit of thought. When we see the word "past" in the sentence "quarter past 1pm", what does it represent?

It represents that the notion that we should add 15 minutes to whatever the hours component is. The type signature of this notion might look like:

(minutes: number) => (hours:number) => TimeObj
Enter fullscreen mode Exit fullscreen mode

So let's get to it

const minutesPast = (minutes: number) => (hours24: number): TimeObj => ({
    hours: hours24,
    minutes
});

const pastParser = pipe(
  S.string("past"),
  P.map(() => minutesPast)
);
Enter fullscreen mode Exit fullscreen mode

Let's break this down. First we define a function minutesPast which takes in the minutes, then the hours in 24 hour format and returns back our 24 hour time object representation.
Next we define the pastParser, this parser matches on the string "past" and carries within it the minutesPast function.

Similarly, we do the same for the toParser

const minutesTo = (minutes: number) => (hours24: number): TimeObj => {
  if (hours24 === 0) {
    return {
      hours: 23,
      minutes: 60 - minutes
    };
  } else {
    return {
      hours: hours24 - 1,
      minutes: 60 - minutes
    };
  }
};

const toParser = pipe(
  S.string("to"),
  P.map(() => minutesTo)
);
Enter fullscreen mode Exit fullscreen mode

We also need a way to parse our <hours12> segment and convert that into a 24 hour format

const isValid12Hour = (hours: number) => hours >= 1 && hours <= 12;
const num12Parser = P.expected(
  pipe(S.int, P.filter(isValid12Hour)),
  "12 hour clock hours must be between 1 & 12"
);

const amParser = pipe(
  S.string("am"),
  P.map(() => (hours12: number) => (hours12 === 12 ? 0 : hours12))
);

const pmParser = pipe(
  S.string("pm"),
  P.map(() => (hours12: number) => (hours12 === 12 ? 12 : hours12 + 12))
);

const amPmParser = P.either(amParser, () => pmParser);

const hours12Parser = P.seq(
  num12Parser,
  (hours12) =>
    pipe(
      amPmParser,
      P.map((f) => f(hours12))
    )
);

console.log(run(hours12Parser, "1am")) // Right 1
console.log(run(hours12Parser, "3pm")) // Right 15
console.log(run(hours12Parser, "14am")) // Left "12 hour clock hours must be between 1 & 12"
Enter fullscreen mode Exit fullscreen mode

Let's break this down. Using a similar technique to the hours24Parser earlier in this article, we create the num12Parer to only allow numbers between 1 & 12.
The amParser and pmParser are Parsers that match on either am or pm and contains a function that converts 12 hour time to 24 hour time. We then join the amParser and pmParser together using the either combinator.

The hours12Parser takes the num12Parser and combines it using seq which is a combinator that passes results from the first Parser to the next Parser if it succeeds. If it succeeds, we pass the parsed hours12 number to the amPmParser, as that contains a function inside, we use P.map to apply our hours12 number to the contained function getting back a number representing the hours in 24 hour time.

Phew! We now have all the parsers for our individual segments. Time to glue them altogether.

const longFormParser = pipe(
  segmentParser, // [1]
  P.chainFirst(() => S.spaces1), // [2]
  P.chain((minutes) =>
    pipe(
      P.either(toParser, () => pastParser),
      P.map((f) => f(minutes)) // [3]
    )
  ),
  P.chainFirst(() => S.spaces1), //ignore 1 or more spaces
  P.chain((f) =>
    pipe(
      hours12Parser, // [4]
      P.map((hours12) => f(hours12)) // [5]
    )
  )
);

console.log(run(longFormParser, "quarter to 12am")); //Right {hours:23, minutes:45}
console.log(run(longFormParser, "half past 12am")); //Right {hours:0, minutes: 30}
Enter fullscreen mode Exit fullscreen mode

Let's talk it out. [1] We start with our segmentParser to match on "quarter" or "half" and get back the number 15 or 30. [2] We then use chainFirst with S.spaces1 to ignore 1 or more whitespace characters. [3] We then use either our toParser or pastParser to get our function of the shape minutes -> hours -> TimeObj. Since we have our minutes already, we partially apply the minutes to our function [3]. After ignoring some more spaces, we use the hours12Parser to extract out the number of hours in 24 hour time [4]. Finally, we have the last argument for our function from [3] and we have the time represented in our TimeObj format.

With that, our alien friend now knows what time it is to help it win the amazing race!

To conclude, I leave you with a poem

A string is not only a string
Wherewith a data structure lies within
Regex don't you use
Use Parser combinators for the win!
~derp

Link to codesandbox:
https://codesandbox.io/s/time-parser-example-jwrwk3?file=/src/index.ts

Top comments (0)