Lab 6: Functional Parsing Party

Posted on November 18, 2019

Announcements

This week’s lab will be due Tuesday, November 26 by 11:59pm. However, this lab is not particularly more difficult than the prior labs.

Next week’s lab sections will serve as extended office hours.

Functional Parsing

In this weeks lab you will implement a parser for a more full featured infix calculator. This calculator supports:

Because you will be using the parser combinators from Text.ParserCombinators.ReadP, you will probably find that your final implementation looks considerably cleaner than your code for Lab 3.

Example

You will find starter code in your repository in the lab6 folder. You may want to install the haskeline package to get a nicer REPL (but if it causes problems see the instructions in Main.hs for disabling it).

$ cabal install haskeline

The evaluator and a simple REPL wrapper have been provided for you. Your job is to write the parser in the Lab6.hs file.

Your calculator should be able to correctly parse expressions like the following.

> 5
5
> 3 + 3
6
> 2+-3
-1
> let x = 5 in x*x
25
> let (x, y) = (3, -4) in -(x + y)*2
2
> 3 + 10 ^ 2 * 2
203
> -(1 + 3)
-4

You will find that the REPL outputs more information than shown above. This is fine–the extra information should assist you with debugging.

We have provided a series of example expressions in text_exps.txt. You can run them all to see if they parse:

$ ghc Main.hs && ./Main < test_exps.txt

And then you can compare them with the provided expectations file:

$ ghc Main.hs && ./Main < test_exps.txt | diff -y test_expectation.txt -

This is essentially how your lab will be graded.

Types

In the Lab6.hs file you can see the types of the kinds of values that your parser will need to produce.

First a couple of type synonyms, indicating that variable names are just strings and the numbers handled by our calculator are all integers. You do not need to handle numbers with decimal points.

type Name   = String  -- Variable names are strings.
type Number = Int     -- The kind of number in our language.

From the examples above, you can see there are two main forms of top level expressions:

  1. A bare math expression without a let; e.g. 4 + (2*5)
  2. A let binding followed by an expression to evaluate; e.g.
let x = 5 in x + 4
let (x1, y1, x2, y2) = (5,2+3,10,10) in (y2-y1)*(y2-y1) + (x2-x1)*(x2-x1)

The TopLevelExp data type represents these forms.

data TopLevelExp
  = MathTLE MathExp
  | LetTLE [Name] [MathExp] MathExp
  deriving (Eq, Show)

The MathTLE constructor simply wraps the bare MathExp.

The LetTLE constructor includes a list of variable names as well as a corresponding list of math expressions. E.g. the following:

let (do, re, mi) = (1, 2, 1+2) in do*re*mi

should be parsed as:

LetTLE ["do", "re", "mi"] [Number 1, Number 2, Plus (Number 1) (Number 2)] (Mult (Mult (Var "do") (Var "re")) (Var "mi"))

You do not have to enforce that the list of variable names and list of math expressions are the same length.

Note that TopLevelExp—either MathTLE or LetTLE—is a top-level expression. There is only one of these, at the top level, and it’s not recursive: a TopLevelExp will not appear inside another TopLevelExp, nor will a TopLevelExp appear inside a MathExp.

The main recursive type is the MathExp, which will look somewhat similar to Lab 3, but with a few additions:

data MathExp
  = Number Number
  | Var    Name
  | Neg    MathExp
  | Plus   MathExp MathExp
  | Minus  MathExp MathExp
  | Mult   MathExp MathExp
  | Div    MathExp MathExp
  | Pow    MathExp MathExp
  deriving (Eq, Show)

Remember that in Haskell a constructor can have the same name as a type (because constructors only appear in the normal expression language and types only appear in type signatures), thus Number Number is valid Haskell: the first Number is the constructor name and the second Number is the type synonym for Int.

You will parse a variable simply as the string of the variable name. A variable starts with a lowercase letter and after the first letter is any alphanumeric character a-z A-Z 0-9. You should not try to handle substitution. It’s the evaluator’s job to substitute the appropriate value for each variable during execution. Just parse a variable as Var "varName" and you’re done.

Negation (Neg) is unary operator. It will only precede a number, a variable, or a parenthetical expression. There may be spaces between the - sign and the expression to negate. (E.g. - 3)

Note the absence of parentheses in the MathExp data type. Your parser will handle parentheses simply by building the appropriate expression.

For example, (1 - 2) - 3 will become:

Minus (Minus (Number 1) (Number 2)) (Number 3)

but 1 - (2 - 3) will become:

Minus (Number 1) (Minus (Number 2) (Number 3))

Handling Operator Precedence

How should you parse:

1 - 2 - 3

A naive parser will produce both (1 - 2) - 3 as well as 1 - (2 - 3), which is ambiguous; you only want the first version.

You will want to use one or more of the chain combinators to get associativity and precedence to work correctly.

Looking at one of these combinators, the type of chainr is

chainr :: ReadP a -> ReadP (a -> a -> a) -> a -> ReadP a

And its documentation says:

chainr p op x parses zero or more occurrences of p, separated by op. Returns a value produced by a right associative application of all functions returned by op. If there are no occurrences of p, x is returned.

That is, you provide two parsers to chain:

  1. A parser that matches everything between your operators.
  2. A parser that matches only the operator; this parser returns a function that combines the expressions to the right and left of the operator.

What will you find between your operators? Well, you’ll find expressions of higher precedence. So you hand chain a parser for everything of higher precedence, as well as a parser for just the operator, something like…

chainr1 parseThingsOfHigherPrecedence parseMinus

What about parseMinus? We know from the type of chainr1:

chainr1 :: ReadP a -> ReadP (a -> a -> a) -> ReadP a

…that our parseMinus needs to have type ReadP (a -> a -> a). What in the world should this parseMinus return?

Pretend you are looking at a - operator: you’ve parsed everything to its left down to a MathExp, you’ve parsed everything to its right down to a MathExp. What would you do with these two MathExps? Probably return something like Minus left right.

What’s the type of Minus?

Minus :: MathExp -> MathExp -> MathExp

Aha! That looks like (a -> a -> a)! Our parseMinus just needs to return the unapplied Minus constructor, and chain will take care of applying it to the expressions on its right and left. So our parseMinus will look something like:

parseMinus :: ReadP (MathExp -> MathExp -> MathExp)
parseMinus =
  ...code here to match just a - sign...
  return Minus

So we now know how to parse subtraction. (Addition has the same precedence—how might you get them to parse together?) We now want to write a parser to parse what’s in between, the expressions of higher precedence.

chainr1 parseThingsOfHigherPrecedence parseMinus

What’s higher precedence than subtraction? Well, multiplication. Is there anything higher precedence than that? Yes, exponentiation. So your multiplication parser will also use chain. Your code will look something like:

chainr1 parseThingsOfHigherPrecedence parseMinus
where
  parseThingsOfHigherPrecedence =
    chainr1 parseThingsOfEvenHigherPrecedence parseMult

…and so on until you come to terminal parsers (e.g. parseNumber).

Stare at the source for one of these combinators to convince yourself that yes, indeed, it is parsing everything in between the operators, parsing the operators, and then combining the results using the function returned by the operator parser:

chainl1 :: ReadP a -> ReadP (a -> a -> a) -> ReadP a
chainl1 p op = p >>= rest
  where rest x = do f <- op
                    y <- p
                    rest (f x y)
                 +++ return x

It is up to you to figure out which of the four varieties of chain combinators you need to use.

Note the precedence for this lab should be, from lowest to highest (least binding to most tightly binding) is:

  1. Addition and subtraction.
  2. Multiplication and division.
  3. Exponentiation.

For exponentiation, 2^3^2 should be interpreted as 2^(3^2).

Note that negation will only appear in front of a number, a parenthetical expression, or a variable.

Negation should bind tightly, but remember that by convention -2^2 is -(2^2), that is, -4.

To Hand In

Your lab6/ directory should have your parser in Lab6.hs, as well as the un-modified starter files Eval.hs, Main.hs, and Unparse.hs.

Grading Breakdown

Be sure to write helper functions to reduce duplicate code. You will need to use it more than once, but littering skipSpaces everywhere is a code smell. Use skipSpaces less than ten times, and the helper you write shouldn’t just be skipBeforeParser parser = skipSpaces >> parser if you end up using that ten times—it’s not an improvement. Be more creative!

Extra Credit

The extra credit is a superset of the regular lab. Do not turn in separate files for the extra credit. Just modify the files that are there—grading will simply involve trying both expressions for the regular lab as well as extra expressions that won’t be parsed by the regular lab.

Commit before attempting the extra credit. You will need to modify the other files in the lab, so you want to make sure you can revert back in case you are unable to complete the extra credit by the deadline. (I would recommend working on a git branch, actually, but if you do you must remember to push master as your submission.)

The EC descriptions below are a little terse. When in doubt, do what Haskell does.

Let Expressions (10%)

Requiring a let ... in ... clause to only occur at the top level is a little restrictive. Remove TopLevelExp and allow let ... in ... to appear in any expression. You will need to require that let and in will not parse as variables. You will also need to modify the evaluator. Some example expressions:

5 + let x = 6 in x
let x = 2 in let y = x + x in 2 + y
let x = (let y = 2 in y+y) in x
let x = (let y = 2 in y+y) in y

Note the last expression above should produce an evaluation error: y is not in scope outside the parenthetical expression.

if-then-else (10%)

Add if exp then exp else exp clauses to the language. You will need to add the standard comparison operators >, <, >=, <=, ==, /=, as well as a not keyword. You will also need to create a new datatype for the kinds of values produced by the evaluator: either a Boolean or a Number.

Lamba Expressions (20%)

Extend the language and evaluator to accept lambda expressions as values (this will be a third kind of value alongside Boolean and Number) as well as perform function application. Only worry about lambda expressions with a single argument. You may require that lambda expressions are wrapped in parentheses.

(\x -> x) 5
(\x -> x 7) (\y -> y*y)
let f = (\x -> x*x) in f 6
let diff = (\x -> (\y -> y-x)) in diff 6 4

A parsed lambda expression is just a name for the argument along with its body expression. You will also need to add an expression for function application.

More tricky is the value of a lambda expression during execution. You need to store two things when you encounter a lambda during execution:

  1. The parsed expression (so you can know the argument name and the body expression).
  2. The bindings of all variables at that point in execution. (Why do we need to do this?)

For execution, use call-by-value semantics (evaluate all arguments before applying a function to them). I believe this is the most natural way to write the evaluator anyway.

If you complete this extra credit, you will have created a Turing-complete programming language. You can perform recursion by passing a function to itself as an extra argument.