Lecture 22: Propositional Logic: Parsing

Propositional logic is an important tool for reasoning about many things, including programs. But our immediate goal is to write programs that will help us in working with propositional formulae. Haskell is very well suited for this sort of work. We'll start by declaring the type of propositional formulae:


    module SimpleProposition where

    data Proposition
        = Var String
        | Boolean Bool
        | Not Proposition
        | And Proposition Proposition
        | Or Proposition Proposition
        | Implies Proposition Proposition
        deriving (Eq,Show)

Our major task for today is write a parser for propositional formulae. To that end, we'll first deal with the unfortunate fact that the notion system that we've been using isn't especially friendly for keyboard input, so we're going to replace the standard logical symbols for propositional connectives with keyboard friendly versions: conjunction/and (&), disjunction/or (|), implication/implies (->), and negation (!). We'll also use T for $\ltrue$, and F for $\lfalse$. But, being defensive programmers, we'll define variables to hold these values, which makes it easy to change our minds, later:


    impliesT,andT,orT,notT,trueT,falseT :: String
    impliesT = "->"
    andT = "&"
    orT = "|"
    notT= "!"
    trueT = "T"
    falseT = "F"

We'll start by writing a simple function token :: a -> String -> ReadP a, which associates a value with a String. We're going to want to ignore whitespace, and this functionality is going to be built into token as well:


    token :: a -> String -> ReadP a
    token value str = do
        skipSpaces
        string str
        return value

This can be simplified by noting first, that there are no bindings, so this is just sequencing:


    token value str = skipSpaces >> string str >> return value

And then, by moving from Monad to Applicative. To that end, we need the Applicative version of the (>>) operator, (*>):


    token value str = skipSpaces *> string str *> pure value

We'll use this first to define an atomic parser for boolean constants:


    parseBool =  Boolean <$> (token True trueT <++ token False falseT)

Next, we'll consider Variable. Here, our representational choices for true and false present us with a minor conundrum. We could do something like this:


    parseVar :: ReadP Proposition
    parseVar = do
        skipSpaces
        name <- munch1 isAlphaNum
        return $ Var name

but this creates a conflict with T and F. Are they boolean constants, or variables? It's going to depend on other coding choices we make. So we'll adopt a convention that variables have to begin with a lower-case letter:


    parseVar = do
        skipSpaces
        c <- satisfy isLower
        cs <- munch isAlphaNum
        return $ Var (c:cs)

Again, we can convert this to Applicative without much trouble, first by "factoring out" the skipSpaces:


    parseVar = skipSpaces *> do
        c <- satisfy isLower
        cs <- munch isAlphaNum
        return $ Var (c:cs)

Then the Var:


    parseVar = skipSpaces *> (Var <$> do
        c <- satisfy isLower
        cs <- munch isAlphaNum
        return $ c:cs)

Then finally:


    parseVar = skipSpaces
             *> (Var <$> ((:) <$> satisfy isLower <*> munch isAlphaNum))

It's possible to push this further, but it's hard to argue that doing so improves clarity or efficiency, so we'll stop here.

Next, it will be useful for having a parser that handles parenthesized expressions: parens :: ReadP a -> ReadP a. We can define this naïvely via a monadic construct:


    parens :: ReadP a -> ReadP a
    parens = do
        skipSpaces
        char '('
        result <- p
        skipSpaces
        char ')'
        return result

But this involves both monadic code, and some twitchiness around our conventions regarding calls to skipSpaces. We can deal with the latter by using our token function:


    parens :: ReadP a -> ReadP a
    parens p = do
        token () "("
        result <- p
        token () ")"
        return result

This can be expressed applicatively using both (*>) and its natural dual (<*):


    parens :: ReadP a -> ReadP a
    parens p = token () "(" *> p <* token () ")"

Alternatively, we can use Text.ParserCombinator.ReadP's between function, and write:


    parens = between (token () "(") (token () ")")

or even


    parens = between (skipSpaces *> char '(') (skipSpaces *> char ')')

At this point, there are two extremely useful parser combinators that come into play:


    chainr1 :: ReadP a -> ReadP (a -> a -> a) -> ReadP a
    chainl1 :: ReadP a -> ReadP (a -> a -> a) -> ReadP a

These are used to parse right- and left-associative operators, where we're guaranteed that at least one expression is present. This makes snapping together a precedence-based parser based on binary operations very easy. To that end, let's assume that implication is at precedence level 0, conjunction at precedence level 1, disjunction at precendence level 2, negation at precedence level 3, and the atomic formula at precedence level 4. We can use this to snap together a precedence-based parser with very little work (not here that we're punting on negation):


    parseProposition :: ReadP Proposition
    parseProposition = prec0 where
        prec0 = chainr1 prec1 (token Implies impliesT)
        prec1 = chainl1 prec2 (token Or orT)
        prec2 = chainl1 prec3 (token And andT)
        prec3 = prec4
        prec4 = parseVar <++ parens prec0 <++ parseBool

This actually works (at least, on the negation free part of Proposition).

To deal with negation requires only a simple prefix :: ReadP a -> ReadP (a -> a) -> ReadP a function, which isn't in Text.ParserCombinator.ReadP. Fortunately, we can write this, but it's a bit tricky:


    prefix :: ReadP a -> ReadP (a -> a) -> ReadP a
    prefix p op = p <++ (op <*> prefix p op)

In this, the repetition of prefix p op is undesirable (it result in unnecessary copying), and we can eliminate it with a recursive definition:


    prefix p op = result where
        result = p <++ (op <*> result)

For those who like golfing, we can go for the hole-in-one by adding the fixed point combinator fix :: (a -> a) -> a from Data.Function,


    prefix p op = fix $ \result -> p <++ (op <*> result)

But if we do this, the Code Fairy is going to want to eliminate the lambda via η-reduction, resulting in the incomprehensibly zen:


    prefix p op = fix $ (<++) p . (<*>) op

and that way lies madness. Or your instructors, who at this very moment are contemplating further η-reductions.

Anyway, at this point a bit of gentle packaging and code-reorganization results in


    instance Read Proposition where
        readsPrec _ = readP_to_S prec0 where
            prec0 = chainr1 prec1 (token Implies impliesT)
            prec1 = chainl1 prec2 (token Or orT)
            prec2 = chainl1 prec3 (token And andT)
            prec3 = prefix  prec4 (token Not notT)
            prec4 = parseVar <++ parens prec0 <++ parseBool
            parseVar = skipSpaces
                     *> (Var <$> ((:) <$> satisfy isLower <*> munch isAlphaNum))
            parseBool =  Boolean <$> (token True trueT <++ token False falseT)

which is shockingly to the point. It would not be unreasonable to be thinking of the calculator lab at this point, and certain simplifications that might be made to the code found therein.

*Exercise 22.1

Write a small desktop calculator, based the following data type:


    data Expression = DoubleValue Double
                    | Sum Expression Expression
                    | Difference Expression Expression
                    | Product Expression Expression
                    | Quotient Expression Expression

You should process input line-by-line, doing a proper precedence-based parse of each line, an evaluation of the resulting expression, and print the final result.

Don't worry about syntax errors, but try to make your calculator program as smooth as possible otherwise. Note also that you can define the parser directly, so there's no need for a Proposition-like extra type layer.

Note that generalizing the parsing strategy above requires dealing with multiple operators at the same precedence level, fortunately with the same associativity. This requires a more complicated "composing" parser than in the examples above, e.g.,


    prec0 = chainl1 prec1 (token Sum "+" <++ token Difference "-")

At this point, we can test our parser.

We can approach this naively:


    > read "a | b -> c -> d & e" :: Proposition
    Implies (Or (Var "a") (Var "b")) (Implies (Var "c") (And (Var "d") (Var "e")))


    > read "!!!a" :: Proposition
    Not (Not (Not (Var "a")))

This is great for reading propositions, but it's not so great for printing them. We can go with something simple:


instance Show Proposition where
    show = showProp where
        showProp (Var v) = v
        showProp (Boolean True) = "T"
        showProp (Boolean False) = "F"
        showProp (Not p) = "!" ++ show p
        showProp (And p q) = "(" ++ show p ++ " & " ++ show q ++ ")"
        showProp (Or p q)  = "(" ++ show p ++ " | " ++ show q ++ ")"
        showProp (Implies p q) = "(" ++ show p ++ " -> " ++ show q ++ ")"

But the results are unsatisfactory. E.g.,


    > read "a | b -> (a -> c) -> (b -> c) -> c" :: Proposition
    ((a | b) -> ((a -> c) -> ((b -> c) -> c)))

As you can see, too many parentheses detract from readability. To avoid this, we have to go to precedence based output. The idea here is that we have a precedence context. If we're in too deep (whatever that means), we need parentheses, but we get to reset our precedence context.

One nuance here is that we'll want to have direct access to the precedence level in deciding when to parenthesize, and so we'll make the precedence level an argument to a prec function (rather than having a bunch of precx functions):


    instance Show Proposition where
        show = prec 0 where
            prec _ (Boolean True) = trueT
            prec _ (Boolean False) = falseT
            prec _ (Var v) = v
            prec _ (Not p) = notT ++ prec 3 p
            prec i (And s t)     = paren 2 i $ unwords [ prec 2 s, andT, prec 2 t ]
            prec i (Or s t)      = paren 1 i $ unwords [ prec 1 s, orT, prec 1 t ]
            prec i (Implies s t) = paren 0 i $ unwords [ prec 1 s, impliesT, prec 0 t]
            paren cutoff prec str
                | prec > cutoff = "(" ++ str ++ ")"
                | otherwise     = str

Note here the particular trickiness around the right-associativity of implication.

We can use the parser/formatter pair to do something that is already a bit interesting. Consider norm.hs:


    module Main where
    
    import Control.Exception
    import System.Environment
    import System.IO
    import SimpleProposition
    
    showArg :: String -> IO ()
    showArg arg = do
        p <- try (readIO arg) :: IO (Either IOError Proposition)
        case p of
            Left _ -> putStrLn $ "Could not parse \'" ++ arg ++ "\'."
            Right r -> putStrLn $ show r
    
    main :: IO ()
    main = do
        args <- getArgs
        mapM_ showArg args

This simply reads and writes a proposition. There is a bit of trickiness in the try (readIO arg) line, but for now, treat this as an idiom that allows us to catch any exception thrown by read via an ordinary Either type without having to write our own exception handlers (not that that's so hard, but we've not covered it).

Why is this interesting? Because beginning students are often confused about precedence, and will use way more parentheses than are actually necessary, e.g.


    $ norm "(a&b) -> (a|b)"
    a & b -> a | b
    $

Exercise 22.2 Note that the it is not necessarily the case that


    (read (show p)) == p

for all propositions p. Why? Does this really matter, and if it doesn't how might we explain ourselves?

Hint: Consider "a | (b | c)".

CMSC-16100

Honors Introduction to Programming, I

Autumn Quarter, 2015

Lecture 22: Propositional Logic: Parsing

Code