# Lecture 17: Introduction to Functional Parsing

A parser is a function/procedure that translates a `String` representing a value of type `a` into that value. It's tempting to propose the following as the definition of an abstract `Parser` type:

``` type Parser s = String -> s ```

The problem here is that if we're building a parser out of pieces, the pieces are going to be sub-parsers that consume part but not all of the input. This suggests the following:

``` type Parser s = String -> (s,String) ```

where we're returning a pair that consists of the result of parsing an initial segment of the input string, and the unparsed remainder. This may be reminiscent of the `State` monad, specialized to `String`. Such a definition would allow us to write code like this:

``` pairp :: Parser a -> Parser b -> Parser (a,b) pairp ap bp s = ((a,b),u) where (a,t) = ap s (b,u) = bp t ```

or even, if we somehow monadify `Parser` along the lines of `State`,

``` pairp :: Parser a -> Parser b -> Parser (a,b) pairp ap bp = do a <- ap b <- bp pure (a,b) ```

or, after putting our applicative thinking hats on, even

``` pairp = liftA2 (,) ```

both of which hint at things to come. Unfortunately, though, this definition of `Parser` isn't quite robust enough. Consider, e.g., the special case of trying to parse a simple arithmetic expression. Suppose we had

``` data Expression = Const Double | Add Expression Expression | Mul Expression Expression expressionParse :: Parser Expression ```

What should `expressionParse "1+2*3"` return? Obviously, we're looking for the result

• `(Add (Const 1) (Mul (Const 2) (Const 3)), "")`

• `(Const 1,"+2*3")`, or
• `(Add (Const 1) (Const 2), "*3")`?

All are plausible, in that they meet the contract for `expressionParse`, even though the last one is a bit problematic. We don't just have one right answer, it seems we have three! It's useful to think this as a non-deterministic calculation. So let's return a list:

``` type Parser s = String -> [(s,String)] ```

Of course, we're going to want to make `Parser` an instance of various standard type classes, so we'll use `newtype` as before:

``` newtype Parser s = Parser { runParser :: String -> [(s,String)] } ```

The result looks like a mash-up of a `State` and a `[]`. This is an interesting and productive observation.

Following Joroen Fokker, who first wrote this sort of functional parser, we now build some simple parsers, e.g., `satisfy` “shifts” the first character of input if it satisfies the argument predicate:

``` satisfy :: (Char -> Bool) -> Parser Char satisfy p = Parser \$ \s -> case s of [] -> [] a:as | p a -> [(a,as)] | otherwise -> [] ```

Note here the spiffy use of guards within patterns within a case statement. There's a really cute way to clean up that the last little bit, which is called the Fokker trick:

``` satisfy :: (Char -> Bool) -> Parser Char satisfy p = Parser \$ \s -> case s of [] -> [] a:as -> [(a,as) | p a] ```

How does this work? If `p a` is false, we end up returning the `[]` type's `empty` which is just `[]`, so `[(a,as) | p a]` is either `[(a,as)]` or `[]`, according to whether `p a` is `True` or `False`, respectively. It's a cute trick, and well worth remembering.

We can use `satisfy` to define a number of additional simple parsers

``` char :: Char -> Parser Char char c = satisfy (c==) alpha, digit, space :: Parser Char alpha = satisfy isAlpha digit = satisfy isDigit space = satisfy isSpace ```

*Exercise 17.1 Show by a series of principled transformations that we can define:

``` char :: Char -> Parser Char char = satisfy . (==) ```

The character predicates from `Data.Char` all beg to be turned into simple parsers, in similar fashion.

Next, we have a simple parser that recognizes a string:

``` string :: String -> Parser String string str = Parser \$ \s -> [(t,u) | let (t,u) = splitAt (length str) s, str == t] ```

Here, again, we use the Fokker trick, creating either a zero- or one-element list. A remarkable fact is that this pretty much concludes our effort to build primitive parsers, further progress is going to take the form of adding `Parser` to various standard type classes, leveraging the power of these simple functions.

We'll start by considering the task of writing a `Parser Bool`. We can start by considering a couple of primitive parsers for recognizing the strings `"True"` and `"False"`.

``` parseTrue = string "True" parseFalse = string "False" ```

An obvious first problem is that these parsers both have type `Parser String`, rather than `Parser Bool`. To get a `Parser Bool`, it's convenient to make `Parser` an instance of `Functor`.

``` instance Functor Parser where fmap f p = Parser \$ \s -> [(f a,t) | (a,t) <- runParser p s] ```

With the `Functor` instance in hand, we can write:

``` parseTrue = (const True) <\$> (string "True") parseFalse = (const False) <\$> (string "False") ```

This sort of thing (a combination of `const` and `(<\$>)`) happens a lot, so unsurprisingly there's an operator `(<\$) :: Functor f => a -> f b -> f a` in `Data.Functor` that does this.

and perhaps recognize a useful pattern here:

``` token :: String -> a -> Parser a token s a = a <\$ string s parseTrue, parseFalse :: Parser Bool parseTrue = token "True" True parseFalse = token "False" False ```

With this,

``` > runParser parseTrue "True" [(True,"")] ```

which isn't exactly what we want, but it's a solid step in the right direction.

Of course, we don't want to be able to parse `"True"` or parse `"False"`, we want to be able to parse a string that contains either. Somehow, we want to combine the two parsers into one parser, merging their outputs. There are a number of ways to approach this. We could make `Parser s` an instance of `Monoid`, or we could make `Parser` an instance of `Alternative`. Indeed, we could do both. There is an apparent cost associated with making `Parser` an instance of `Alternative`, and that is that `Alternative` requires `Applicative`. But a moment's consideration should suggest that we're going to want to combine the outputs of multiple parsers, as hinted at with the `pairp` example, and we know that the easiest way to provide generalizations of `fmap` to functions of greater arity is through `Applicative` anyway. So...

``` instance Applicative Parser where pure a = Parser \$ \s -> [(a,s)] af <*> aa = Parser \$ \s -> [ (f a,u) | (f,t) <- runParser af s , (a,u) <- runParser aa t ] instance Alternative Parser where empty = Parser \$ \s -> [] p1 <|> p2 = Parser \$ \s -> runParser p1 s ++ runParser p2 s ```

With this in hand, not only does our `pairp` example work, we can finish the `Parser Bool`:

``` parseBool :: Parser Bool parseBool = token "True" True <|> token "False" False ```

An alternative solution, well worth remembering in other contexts, is

``` parseBool = read <\$> (string "True" <|> string "False") ```

At this point, it's standard practice to introduce parser combinators `many` and `some`, which given a `Parser a` return a `Parser [a]`, where the `many` version returns a list of zero or more successful parses, and `some` a list of one or more parses. It's natural to write these via mutual recursion, as

``` many, some :: Parser a -> Parser [a] many p = some p <|> pure [] some p = liftA2 (:) p (many p) ```

Much to our surprise, attempting to do so results in an error: `many` and `some` are already defined in `Control.Applicative`, with the type signature

``` many, some :: Alternative f => f a -> f [a] ```

Moreover, a quick consideration of our implementation reveals that it only uses functions in the `Alternative` or `Applicative` type classes. Unsurprisingly, we've just reimplemented these standard functions! Likewise, it's often useful to have an `optional` parser combinator, which is used for elements that may be present, but it too is already available in `Control.Applicative`, simplifying our work considerably.

For example:

``` > runParser (many parseBool) "TrueTrue!" [([True,True],"!"),([True],"True!"),([],"TrueTrue!")] > runParser (some parseBool) "TrueTrue!" [([True,True],"!"),([True],"True!")] > runParser (optional parseBool) "TrueTrue!" [(Just True,"True!"),(Nothing,"TrueTrue!")] ```

We can now consider a mildly non-trivial parsing problem. Consider the type

``` data IntV = IntV Int deriving (Show) ```

This is just an `Int` in a box. Can we parse this given its natural syntax?

``` parseInt = read <\$> some digit skipSpaces = const () <\$> many space parseIntV = liftA3 (\_ _ i -> IntV i) (string "IntV") skipSpaces parseInt ```

For example:

``` > runParser parseIntV "123!" [] > runParser parseIntV "IntV 123!" [(IntV 123,"!"),(IntV 12,"3!"),(IntV 1,"23!")] ```

A nice consequence of writing `parseIntV` is that it enables us to add `IntV` to the `Read` class, via

``` instance Read IntV where readsPrec _ = runParser parseIntV ```

Which we can test with

``` > read "IntV 18" :: IntV IntV 18 ```

Sweet! Of course, we could have also achieved the ability to read `IntV` values like this by including `Read` in the `deriving` clause of the type definition, but explicitly defining the `Read IntV` instance allows us to choose a different syntax if desired. Note that the wildcard argument to `readsPrec` is a precedence level, which means that we can have different parsers associated with different parsing contexts.

There's a small gap here, in that the parsers we're building return multiple results, whereas `read` returns only a single result. The bridge is in terms of code that filters the result list for those pairs `(a,t)` where the unparsed string `t` consists only of whitespace, and then requires that there be only a single result of that form, e.g.,

``` parseWith :: Parser a -> String -> a parseWith p s = case [a | (a,t) <- runParser p s, all isSpace t] of [a] -> a [] -> error "no parse" _ -> error "ambiguous parse" ```

It's reasonable to argue that this might have been better handled by returning a value of type `Either String a` than by throwing an exception via `error`. When you write your own language, be sure to do it that way.

Historically, functional programmers approached parser combinators through `Monad` rather than through `Applicative` and friends. It's worth noting that we can make `Parser` into a monad, and this has the considerable advantage of making monadic `do`-notation available to us, although pending changes to GHC/Hackage will permit the use of restricted versions of `do` with `Applicative`, largely eliminating this advantage.

``` instance Monad Parser where p >>= g = Parser \$ \s -> [ (b,u) | (a,t) <- runParser p s , (b,u) <- runParser (g a) t ] instance MonadPlus Parser ```

With these definitions, we could have written

``` parseIntV :: Parser IntV parseIntV = do string "IntV" skipSpaces i <- parseInt pure \$ IntV i ```

which has the advantage of conceptual simplicity. That said, contemporary practice favors using `Applicative` based constructors over `Monad` based constructors, and the results argue for themselves in terms of concision if not always clarity.

*Exercise 17.2 Consider the following type declaration:

``` data ComplexInt = ComplexInt Int Int deriving (Show) ```

Use `Parser` to implement `Read ComplexInt`, where you can accept either the simple integer syntax `"12"` for `ComplexInt 12 0` or `"(1,2)"` for `ComplexInt 1 2`, and illustrate that `read` works as expected (when its return type is specialized appropriately) on these examples. Don't worry (yet!) about the possibility of minus signs in the specification of natural numbers.

This is cool stuff, and we can and will take this style of parsing very far. This isn't our father's Fortran, nor indeed our high school Java or Scheme.

With all of this work behind us, the end product is remarkably concise and powerful parser combinator library that requires less than a full page of code. Just to be clear here: we haven't written a parser in less than a full page of code, we've written a module for writing parsers in less than a full page of code. And we're not done yet.

### Ἀπὸ μηχανῆς θεός

Remember back when we remarked that original definition of `Parser` looked like a mashup of a `State` monad and a `[]` monad? We can breathe life into this observation by noting that `Control.Monad.State` ultimately defines `State` in terms of the monad transformer `StateT`. Looking at the definition of `StateT`, we have

``` newtype StateT s m a = StateT { runStateT :: s -> m (a,s) } ```

So, if we define

``` type Parser = StateT String [] ```

we at least get the type right. But here's where the wild magic begins: if we look at the instances provided by `StateT`, we'll see:

``` Functor m => Functor (StateT s m) (Functor m, Monad m) => Applicative (StateT s m) (Functor m, MonadPlus m) => Alternative (StateT s m) Monad m => Monad (StateT s m) MonadPlus m => MonadPlus (StateT s m) ```

In the current context, `m` is `[]`, and we know that `[]` is a `Functor`, a `Monad`, and a `MonadPlus`, and as such, our `Parser` type will be an instance of `Functor`, `Applicative`, `Alternative`, `Monad`, and `MonadPlus` for free! Moreover, once we have these instances, we can use combinators for these type classes to simplify our base definitions, e.g.,

``` string :: String -> Parser String string = mapM char ```

Pulling this all together, we have

• Parser.hs, via Monad Transformers, in 22 lines.

Not a parser, but a powerful module for writing parsers . Try that in C++ or Java ;-).

The sudden appearance of monad transformers here very much has the flavor of a deus ex machina, and you're all very much to be forgiven if you're not only saying to yourself, “I didn't see that coming,” but “Is he really expecting that I'm going to be able to pull this kind of wild rabbit out of my hat?” No. Or at least, not yet. In truth, I gave a predecessor version of this lecture for three consecutive years without invoking this wild magic, only realizing in the summer before the fourth year that not only could the previously mysterious monad transformers be used here, but they resulted in a `Parser` class that was both more concise and more powerful. It was a programmer's epiphany—to paraphrase the great Hungarian combinatorialist Paul Erdös, I felt as if I'd been granted a peak at a page of code from God's own git repository.

But that said, you're unlikely to find what you're not looking for. We'll do some more examples that use monad transformers later in the quarter, and will even take a quick peek at some of the wild magic that the Monad Transformer Library (mlt) uses to simplify both the coding and the use of monad transformers.

### Historical Note

The use of parser combinators to build backtracking parsers has a long history in functional programming. A particularly formative (and illustrative) article is Functional Parsers (PDF) by Joroen Fokker. Note that Fokker's code isn't actually Haskell—it's Gofer, a predecessor language. You shouldn't have any trouble with translation. Another good source for this material is Chapter 13 of Hutton (2nd edition).