DEV Community

Artemis
Artemis

Posted on • Edited on

Parsers are relative bimonads

Parsers are what now?!

Even to a Haskeller —proficient in the magics of functors and monads— the term of relative monads might be foreign; let's thus motivate their existence.

Recently I've been thinking about how parsing expression grammar specifies very little about how syntax errors are to be reported or aggregated.

Declaration ← Function / Struct / Module
Function ← "fn" Name "(" ...
Struct ← "struct" Name "{" ...
Module ← "module" Name "{" ...
Enter fullscreen mode Exit fullscreen mode

Take the above grammar fragment as an example; given the input struct {..., what error would one expect?

The best outcome is along the lines of "expected a name in the struct declaration", but how do we know to report that error? Does the grammar not say to parse Module if Struct does not match? So then the error should be "expected module but got struct". That's not very helpful... Alternatively we could aggregate all errors and report whatever the three different rules report.

While specifying what is to be done can be handled by the specification, how do we actually implement any of the above?

Take typical parser combinators in Haskell:

-- we could parameterize the input type
-- it is kept as String for simplicity's sake
newtype Parser err suc = Parser ( String -> (String, Either err suc) )
Enter fullscreen mode Exit fullscreen mode

There will be two basic combinators: chaining parsers (P1 P2 P3) and alternating them (P1 / P2 / P3). The former is implemented using simple applicative functors:

instance Applicative (Parser err) where
    pure suc = Parser $ \i -> (i, Right suc)
    (Parser p) <*> (Parser q) = Parser $ \i ->
        let (rest, f) = p i in
        let (final, suc) = q rest in
        (rest, f <*> suc)
Enter fullscreen mode Exit fullscreen mode

While the latter is implemented using the Alternative typeclass:

instance Alternative (Parser ()) where
    empty = Parser $ \i -> (i, Left ())
    (Parser p) <|> (Parser q) = Parser $ \i ->
        case p i of
            (_, Left _) -> q i
            (rest, suc) -> (rest, suc)
Enter fullscreen mode Exit fullscreen mode

However we already see some issues: as we cannot reliably construct values for all possible error types we have to use () for empty. This also makes our implementation of (<|>) only apply to parsers with said error type. Even if we managed to generalize (<|>) to all errors, we would still only get the most recent error that occurred, representing only "expected module" from the earlier example.

Instead we need the ability to aggregate our errors, just like we do with successful outputs in Applicative...
So let's implement Applicative for errors!

-- A newtype wrapper to change the order of type parameters on parser
newtype Flip f a b = Flip (f b a)

instance Applicative (Flip Parser suc) where
    pure :: err -> Flip Parser suc err
    pure err = Flip $ Parser $ \i -> (i, Left err)
    -- interestingly the implementation looks quite close to Alternative
    (<*>) :: Flip Parser suc (err -> err') -> Flip Parser suc err -> Flip Parser suc err'
    (Flip (Parser p)) <*> (Flip (Parser q)) = Flip $ Parser $ \i ->
        case p i of
            (_, Left f) -> case q i of
                (_, Left err) -> (i, f err)
                (rest, suc) -> (rest, suc)
            (rest, suc) -> (rest, suc)
Enter fullscreen mode Exit fullscreen mode

Wow! That's unwieldy.
As Haskell lacks the machinery to implement typeclasses on multiple different type parameters, the result is quite ugly. Thus we arrive at the Bifunctor typeclass. A result of exactly this problem on other types like tuples. If functors can be binary, why can't applicatives or even monads for that matter?

So let's do as true programmers do and look up those typeclasses on the internet:

35

Haskell's Data.Bifunctor is basically:

class Bifunctor f where
  bimap :: (a -> c) -> (b -> d) -> f a b -> f c d 

I could find a Biapply as well. My question is, why isn't there a complete bi-hierarchy (bierarchy?) like:

class Bifunctor f => Biapplicative f where

Following Sjoerd's answer, we finally arrive at relative bimonads.

A monad in category theory is an endofunctor, i.e. a functor where the domain and codomain is the same category. But a Bifunctor is a functor from the product category Hask x Hask to Hask. But we could try to find out what a monad in the Hask x

Due to their categorical roots, monads don't play well with such ad-hoc extensions. A monad is an endofunctor, meaning it maps types to types. A bifunctor is a functor from two types to one type. These two notions are incompatible, but a bifunctor can collapse two types into one, so our bimonad just needs to act on the output of some bifunctor!

This process of using functors to 'preprocess' our inputs is what relative monads are.

After some experimentation however it becomes apparent that we cannot do the same for applicatives, as the biap function does not compose for Either. Using a normal extension of applicatives into two parameters isn't ideal either, as bipure :: a -> b -> f a b does not give us the ability to specify whether the parser is an always failing or an always successful one. However, monads can express applicative composition using bind and return.
The same is true for relative bimonads.

Taking the typeclass from the stackoverflow thread linked:

class (Bifunctor j) => RelativeBimonad j m where
    bireturn :: j a b -> m a b
    bibind :: m a b -> (j a b -> m c d) -> m c d
Enter fullscreen mode Exit fullscreen mode

Implementing these typeclasses for our parser is relatively simple. The bifunctor that performs the preprocessing is simply the Either that we have in the definition of our parser.

instance RelativeBimonad Either Parser where
    bireturn :: Either err suc -> Parser err suc
    bireturn res = Parser $ \i -> (i, res)
    bibind :: Parser err suc 
           -> (Either err suc -> Parser err' suc') 
           -> Parser err' suc'
    bibind (Parser p) f = Parser $ \i ->
        let (rest, out) = p i in
        let (Parser q) = f out in
        q rest
Enter fullscreen mode Exit fullscreen mode

It is important to note, that bibind collapses the alternative combinator (<|>) (or as in PEG /) and the sequencing combinator (juxtaposition in PEG). As this is usually hard to reason about, I define two small helpers:

onSuc :: (suc -> Parser err suc') -> Either err suc -> Parser err suc'
onSuc f (Right suc) = f suc
onSuc _ (Left err) = bireturn $ Left err

onErr :: (err -> Parser err' suc) -> Either err suc -> Parser err' suc
onErr f (Left err) = f err
onErr _ (Right suc) = bireturn $ Right suc
Enter fullscreen mode Exit fullscreen mode

Which will allow us to write the long ago mentioned example as:

data Error = NoDeclErr
           | StructErr Part
           | FuncErr Part
           | ModErr Part
  deriving (Show)

data Part = Keyword
          | Name
          | Body
  deriving (Show, Eq)

data Decl = Struct String String
          | Func String String
          | Mod String String
  deriving (Show)

parseString err str = Parser $ \i -> case stripPrefix str i of
    Just r -> (r, Right str)
    Nothing -> (i, Left err)

-- as example is not a full grammar, this accepts any string surrounded by open, close
parseBody open close =
    bibind (parseString Body [open ])(onSuc (\_ ->
    bibind (parseUntil        close )(onSuc (\s ->
    bibind (parseString Body [close])(onSuc (\_ ->
    bireturn $ Right s
    ))))))
  where parseUntil char = Parser $ \i -> (dropWhile (/= char) i, Right $ takeWhile (/= char) i)

parseName = Parser $ \i -> case takeWhile isAlpha i of
    _ : _ -> (dropWhile isAlpha i, Right $ takeWhile isAlpha i)
    _     -> (i, Left Name)

whiteSpace = Parser $ \i -> (dropWhile isSpace i, Right $ ())

parseWhole mkWhole key open close =
    bibind (parseString Keyword key)(onSuc (\_ ->
    bibind  whiteSpace                     (\_ ->
    bibind  parseName               (onSuc (\s1 ->
    bibind  whiteSpace                     (\_ ->
    bibind (parseBody open close)   (onSuc (\s2 ->
    bireturn $ Right $ mkWhole s1 s2
    ))))))))

parseStruct = parseWhole Struct "struct" '{' '}'
parseFunc   = parseWhole Func "fn" '(' ')'
parseMod    = parseWhole Mod "module" '{' '}'

parseDecl =
    bibind parseFunc   (onErr (\e1 ->
    bibind parseStruct (onErr (\e2 ->
    bibind parseMod    (onErr (\e3 ->
    bireturn $ Left $ errSelect e1 e2 e3
    ))))))

-- selects the first non-keyword (non-immediate) error
errSelect e1 e2 e3
    | e1 /= Keyword = FuncErr   e1
    | e2 /= Keyword = StructErr e2
    | e3 /= Keyword = ModErr    e3
    | otherwise     = NoDeclErr

runParser :: Parser a b -> String -> Either a b
runParser (Parser p) i = let (_, o) = p i in o

main :: IO ()
main = print $ runParser parseDecl "struct Foo"
Enter fullscreen mode Exit fullscreen mode

Which will print Left (StructErr Body), correctly parsing the existing structure and choosing the correct error to report based on a simple user supplied heuristic.

It is very evident that unlike monads, bimonads do not have the same level of language support. If there was a notation similar to the 'do'-notation for bimonads, this code would look way cleaner. Though the bibind parser (onSuc f) pattern is just the normal monad instance, which we are not restricted from using.

The following Github Gist contains the example code with all boilerplate needed to run, as well as the typical implementation of Monad on parsers and rewriting of two rules using do-notation.

Corrections

I called the construct "relative bimonads", while in the literature bimonads are actually a different thing to what I describe. I picked the name to remind of bifunctors.

Top comments (0)