Last time, we built up a nice simple functional Parser
library. Though elegant, it turns out that our implementation — lists of potential results and backtracking — is not the most efficient.
The Text.ParserCombinators.ReadP
library is a much faster and more full-featured parser implementation.
ReadP a ~= Parser a ~= String -> [(a, String)]
> :info ReadP
newtype ReadP a = R (forall b. (a -> P b) -> P b)
The type involves forall b
… We’re not going to worry about this internal representation. ReadP
works like Parser
, but there are no (public) data constructors.
> :info ReadS -- type alias for String -> [(a,String)]
Small differences in names, functionality, types compared to our Parser
:
Parser a |
ReadP a |
---|---|
Parser |
readS_to_P (avoid using this!) |
runParser |
readP_to_S |
char |
char |
satisfy |
satisfy |
string |
string |
token |
|
(<$>) |
(<$>) |
(<*>) |
(<*>) |
(>>=) |
(>>=) |
(<|>) (Alternative) |
(<|>) , (+++) |
many (Alternative) |
many |
some (Alternative) |
many1 |
optional (Alternative) |
optional (works differently) |
option |
|
(<++) (left-biased sum) |
|
munch (greedy; similar to string ) |
|
sepBy |
|
between |
|
look (lookahead efficiently) |
Consider the difference between (many . satisfy)
and munch
:
> readP_to_S (R.many $ R.satisfy isAlpha) "asdf"
[("","asdf"),("a","sdf"),("as","df"),("asd","f"),("asdf","")]
> readP_to_S (R.many1 $ R.satisfy isAlpha) "asdf"
[("a","sdf"),("as","df"),("asd","f"),("asdf","")]
> readP_to_S (R.munch isAlpha) "asdf"
[("asdf","")]
Could implement munch for Parser
:
munch :: (Char -> Bool) -> Parser String
munch p = parser $ \s -> [span p s]
Let’s use ReadP
to write a simple parser for a simple type of lists.
data List
= Nil
| ConsInt Int List
instance Show List where
show Nil = "[]"
show (ConsInt n ns) = show n ++ ":" ++ show ns
Once we build our parser, plug it into Read
:
class Read a where
-- readsPrec :: Int -> ReadS a
readsPrec :: Int -> String -> [(a, String)]
The first argument is a precedence level. Our simple example parser will not track precedence; we will see examples of using precedence in the future.
instance Read List where
readsPrec _ = readP_to_S parseList
parseList :: ReadP List
parseList = parseNil +++ parseCons
Start with parsing empty list:
parseNil :: ReadP List
parseNil = do
string "[]"
return Nil
Now parse cons:
parseInt :: ReadP Int
parseInt = do
n <- many1 $ satisfy isDigit
return $ read n
parseCons :: ReadP List
parseCons = do
n <- parseInt
char ':'
ns <- parseList
return $ ConsInt n ns
Okay, now let’s add sugared versions:
parseSweetList :: ReadP List
parseSweetList = do
char '['
ns <- parseInt `sepBy` (char ',')
char ']'
return $ foldr ConsInt Nil ns
Clean up parseSweetList
with between
:
parseSweetList = do
ns <- between (char '[') (char ']') (sepBy parseInt (char ','))
return $ foldr ConsInt Nil ns
This works on its own, but…
parseList = parseNil +++ parseCons +++ parseSweetList
>> read "[]" :: List
*** Exception: Prelude.read: ambiguous parse
Problem is parseSweetList
might match []
, but so does parseNil
. So, use sepBy1
instead:
ns <- isDigit `sepBy1` (char ',')
But, in any case, these different parsers accept disjoint sets of strings, so no need to run all.
parseList = parseNil <++ parseCons <++ parseSweetList
In our implementations of parseNil
, parseInt
, parseCons
, and parseSweetList
above, we used (>>=)
(via do
-notation) to stitch together multiple parsers. But none of those implementations actually require the full power of the Monad
interface (i.e. the ability of one parser to depend on the result of the previous).
parseNil
, parseInt
, and parseSweetList
can be written via the Functor
interface:
parseNil = const Nil <$> string "[]"
parseInt = read <$> (many1 $ satisfy isDigit)
parseSweetList =
foldr ConsInt Nil <$>
between (char '[') (char ']') (sepBy parseInt (char ','))
parseCons
cannot, because the pure function we want to apply, a wrapper around ConsList
, takes one more than one argument. Enter Applicative
:
parseCons =
(\n _ ns -> ConsInt n ns)
<$> parseInt
<*> char ':'
<*> parseList
Having to create a lambda just to ignore some of the results is clunky. Here is where the following Applicative
operator comes in handy.
(<*) :: Applicative f => f a -> f b -> f a
fa <* fb = (\a _ -> a) <$> fa <*> fb
(<*) = liftA2 const
For example, rather than writing…
(\a1 _ a3 _ a5 -> e) <$> x1 <*> x2 <*> x3 <*> x4 <*> x5
… can write
(\a1 a3 a5 -> e) <$> x1 <* x2 <*> x3 <* x4 <*> x5
Notice that sprinkling in (<*>)
“keeps” the result of the second parser whereas sprinkling in (<*)
“ignores” the result.
May also choose to forgo (<$>)
before the first argument so that there are only two operators to look at:
pure (\a1 a3 a5 -> e) <*> x1 <* x2 <*> x3 <* x4 <*> x5
Formatting this expression with newlines results in a nice “parser pipeline”, a pattern borrowed from an Elm library, albeit with different operators:
pure (\a1 a3 a5 -> e)
<*> x1
<* x2
<*> x3
<* x4
<*> x5
Now, back to parseCons
:
parseCons :: ReadP List
parseCons =
pure ConsInt
<*> parseInt
<* char ':'
<*> parseList
And we could choose to write parser pipelines for all the rest, too, to create a similar feel (even though we don’t need Applicative
for parseInt
and parseSweetList
):
parseInt :: ReadP Int
parseInt =
pure read
<*> (many1 $ satisfy isDigit)
parseCons :: ReadP List
parseCons =
pure ConsInt
<*> parseInt
<* char ':'
<*> parseList
parseSweetList :: ReadP List
parseSweetList =
pure (foldr ConsInt Nil)
<*> between (char '[') (char ']') (sepBy parseInt (char ','))