Functional Parsing (continued)

Last time, we built up a nice simple functional Parser library. Though elegant, it turns out that our implementation — lists of potential results and backtracking — is not the most efficient.

ReadP library

The Text.ParserCombinators.ReadP library is a much faster and more full-featured parser implementation.

ReadP a ~= Parser a ~= String -> [(a, String)]

> :info ReadP
newtype ReadP a = R (forall b. (a -> P b) -> P b)

The type involves forall b… We’re not going to worry about this internal representation. ReadP works like Parser, but there are no (public) data constructors.

> :info ReadS    -- type alias for String -> [(a,String)]

Small differences in names, functionality, types compared to our Parser:

`Parser a`	`ReadP a`
`Parser`	`readS_to_P` (avoid using this!)
`runParser`	`readP_to_S`
`char`	`char`
`satisfy`	`satisfy`
`string`	`string`
`token`
`(<$>)`	`(<$>)`
`(<*>)`	`(<*>)`
`(>>=)`	`(>>=)`
`(<\|>) (Alternative)`	`(<\|>)`, `(+++)`
`many (Alternative)`	`many`
`some (Alternative)`	`many1`
`optional (Alternative)`	`optional` (works differently)
	`option`
	`(<++)` (left-biased sum)
	`munch` (greedy; similar to `string`)
	`sepBy`
	`between`
	`look` (lookahead efficiently)

Consider the difference between (many . satisfy) and munch:

> readP_to_S (R.many $ R.satisfy isAlpha) "asdf"
[("","asdf"),("a","sdf"),("as","df"),("asd","f"),("asdf","")]

> readP_to_S (R.many1 $ R.satisfy isAlpha) "asdf"
[("a","sdf"),("as","df"),("asd","f"),("asdf","")]

> readP_to_S (R.munch isAlpha) "asdf"
[("asdf","")]

Could implement munch for Parser:

munch :: (Char -> Bool) -> Parser String
munch p = parser $ \s -> [span p s]

Example: Parsing Lists

Let’s use ReadP to write a simple parser for a simple type of lists.

data List
  = Nil
  | ConsInt Int List

instance Show List where
  show Nil = "[]"
  show (ConsInt n ns) = show n ++ ":" ++ show ns

Once we build our parser, plug it into Read:

class Read a where
 -- readsPrec :: Int -> ReadS a
    readsPrec :: Int -> String -> [(a, String)]

The first argument is a precedence level. Our simple example parser will not track precedence; we will see examples of using precedence in the future.

instance Read List where
  readsPrec _ = readP_to_S parseList

parseList :: ReadP List
parseList = parseNil +++ parseCons

Start with parsing empty list:

parseNil :: ReadP List
parseNil = do
  string "[]"
  return Nil

Now parse cons:

parseInt :: ReadP Int
parseInt = do
  n <- many1 $ satisfy isDigit
  return $ read n

parseCons :: ReadP List
parseCons = do
  n <- parseInt
  char ':'
  ns <- parseList
  return $ ConsInt n ns

Okay, now let’s add sugared versions:

parseSweetList :: ReadP List
parseSweetList = do
  char '['
  ns <- parseInt `sepBy` (char ',')
  char ']'
  return $ foldr ConsInt Nil ns

Clean up parseSweetList with between:

parseSweetList = do
  ns <- between (char '[') (char ']') (sepBy parseInt (char ','))
  return $ foldr ConsInt Nil ns

This works on its own, but…

parseList = parseNil +++ parseCons +++ parseSweetList

>> read "[]" :: List
*** Exception: Prelude.read: ambiguous parse

Problem is parseSweetList might match [], but so does parseNil. So, use sepBy1 instead:

ns <- isDigit `sepBy1` (char ',')

But, in any case, these different parsers accept disjoint sets of strings, so no need to run all.

parseList = parseNil <++ parseCons <++ parseSweetList

Monadic vs. Applicative parsing

In our implementations of parseNil, parseInt, parseCons, and parseSweetList above, we used (>>=) (via do-notation) to stitch together multiple parsers. But none of those implementations actually require the full power of the Monad interface (i.e. the ability of one parser to depend on the result of the previous).

parseNil, parseInt, and parseSweetList can be written via the Functor interface:

parseNil = const Nil <$> string "[]"

parseInt = read <$> (many1 $ satisfy isDigit)

parseSweetList =
  foldr ConsInt Nil <$>
    between (char '[') (char ']') (sepBy parseInt (char ','))

parseCons cannot, because the pure function we want to apply, a wrapper around ConsList, takes one more than one argument. Enter Applicative:

parseCons =
  (\n _ ns -> ConsInt n ns)
    <$> parseInt
    <*> char ':'
    <*> parseList

Parser Pipelines

Having to create a lambda just to ignore some of the results is clunky. Here is where the following Applicative operator comes in handy.

(<*) :: Applicative f => f a -> f b -> f a

fa <* fb = (\a _ -> a) <$> fa <*> fb
(<*)     = liftA2 const

For example, rather than writing…

(\a1 _ a3 _ a5 -> e) <$> x1 <*> x2 <*> x3 <*> x4 <*> x5

… can write

(\a1 a3 a5 -> e) <$> x1 <* x2 <*> x3 <* x4 <*> x5

Notice that sprinkling in (<*>) “keeps” the result of the second parser whereas sprinkling in (<*) “ignores” the result.

May also choose to forgo (<$>) before the first argument so that there are only two operators to look at:

pure (\a1 a3 a5 -> e) <*> x1 <* x2 <*> x3 <* x4 <*> x5

Formatting this expression with newlines results in a nice “parser pipeline”, a pattern borrowed from an Elm library, albeit with different operators:

pure (\a1 a3 a5 -> e)
  <*> x1
  <*  x2
  <*> x3
  <*  x4
  <*> x5

Now, back to parseCons:

parseCons :: ReadP List
parseCons =
  pure ConsInt
    <*> parseInt
    <*  char ':'
    <*> parseList

And we could choose to write parser pipelines for all the rest, too, to create a similar feel (even though we don’t need Applicative for parseInt and parseSweetList):

parseInt :: ReadP Int
parseInt =
  pure read
    <*> (many1 $ satisfy isDigit)

parseCons :: ReadP List
parseCons =
  pure ConsInt
    <*> parseInt
    <*  char ':'
    <*> parseList

parseSweetList :: ReadP List
parseSweetList =
  pure (foldr ConsInt Nil)
    <*> between (char '[') (char ']') (sepBy parseInt (char ','))

Source Files

ParseList.hs