Lab 3: Parsing Party - Simple Infix Calculator

Posted on October 25, 2016

Announcements

Due to the midterm, this week’s lab will be due by 11:59:59pm Saturday, October 29.

Building a Simple Haskell Calculator

Later in the course you will consider monadic parsing but in this lab we will focus on the simple parsing problem of constructing a calculator. The goal is to gain some experience with handling strings and input.

The calculator should work as follows:

> 1 + 1
2
> 2 * 6
12
> (1 + 3) * (5 + 1)
24
> (1 + 02 + 3 * 2 + 6)
15
> (1 + (2 + ((4 + 2) * 6)))
39
> 5 +-6    -- This should be parsed as 5 + (-6)
-1

where user-input is indicated by a prepended >.

In this lab you will:

Create a data type ArithExp to represent an expression.
Implement a function eval to evaluate an expression to an integer.
Implement a function tokenize to divide a string into a list of tokens.
Implement a function parse to convert a token list into an expression.
Use these functions to make a working calculator that works like the example above: it can read user input, parse the string, evaluate to an integer, and output the result.
As with last week, write tests to prove everything works.

Skeleton Program

import System.IO


main :: IO ()
main = do
     putStr "> "
     hFlush stdout
     line <- getLine
     putStrLn line
     main

This is the basis for the calculator program. Call this file Main.hs.

In order to import your code for testing, you will need to create a separate Lab3.hs file which you import into the main skeleton above and into the test file. Most of your code will go into this Lab3.hs.

`ArithExp`

Define the ArithExp Data Type. Recursion is the name of the game.

This data type should represent:

a constructor Number for positive or negative integers.
Addition, Multiplication, and Division operators represented by the constructors Plus, Mult and Div. These operators are most easily defined as a data constructor taking two arguments.

The abstract ArithExp form for (2+3)*(3+2) would be

(Mult (Plus (Number 2) (Number 3)) (Plus (Number 3) (Number 2)))

Evaluating an `ArithExp`

Again, use recursion. eval should be a function that takes ArithExp and outputs a number. This is the first thing to get working. For division, use the quot function. Check that your eval works by ensuring that you get the right outputs for:

(Number (-3))

(Mult (Plus (Number 2) (Number (-6))) (Plus (Number 3) (Number 2)))

(Div (Number 13) (Number 6))

(Plus (Mult (Number 2) (Number 3)) (Div (Number 13) (Number (-6))))

and some other expression. Generate your own simple ones and make sure that things work. Put those in your sample runs file Tests.hs. For the test file, use the same format as last lab (e.g. see Submit below).

Parsing a String to `ArithExp`

The input will be *, -, +, [0-9], (, ), and /. Also, any - will always precede a number [0-9]. You do not have to handle bad input.

Here’s everything you’ll have to do:

Ignore whitespace. You may want to use isSpace in the library Data.Char.
Tokenize. That is, write a tokenize function to break up an input string into a list of tokens: numbers (positive or negative), operators, or parenthetical expressions. The functions dropWhile, takeWhile, break, or span in Data.List might be helpful here. Also isDigit in Data.Char is useful. You may want to define a Token data type.
Turn the token list into an ArithExp. Write a parse function to convert a list of tokens into an ArithExp. Use read to get integers out (read x :: Int). Thinking about arithmetic, it makes sense to split the list at the operator of least precedence and then recurse on the elements.
For final output, remember show x turns an Integer to a String.

Put tokenize and parse tests in Tests.hs. Once you are convinced you can tokenize and parse all simple expressions (and more complicated ones), ensure that eval can handle all of the expressions.

Work incrementally. Begin by tokenizing simple expressions:

1+2
1 + 20
1 * -2

Then get order of operations working correctly:

1 + 2 * 3 + 4
1 * 2 + 3 + 4

Then parse parentheses enclosing an entire expression:

(1+2)
 (1 + 2 )
(1 * 2 + 3)

Finally, work out how to recursively handle nested parentheses:

(-10 + 2) * 5
5 * (-10 + (2 + 4) * 3)
5 * (-10 + (2 + 4) * 3) * (3 + 2)

Handling parenthetical expressions is probably the trickiest part of the lab. You may choose to do it as part of tokenize or as part of parse.

Debugging

Debug.Trace will help you debug. Start with traceShowId.

Submit

The git submission system is almost ready, but this week you’ll be emailing your three files to brianhempel@uchicago.edu again.

Make a lab3 folder with Main.hs, Lab3.hs, and Tests.hs. The Lab3.hs file will have most of your code and should export functions for use in Tests.hs and Main.hs. The Main.hs file is your simple I/O loop. Follow the syntax from the first lab to understand modules (or check the course notes). Your Tests.hs should have sample runs implemented as tests, e.g.:

test1 = eval (Plus (Number 1) (Number 2)) == 3

and a final tests list:

tests = [test1, test2, ..., testn]
testAll = all id tests

You are primarily evaluated on whether you correctly implement the calculator.

Grading breakdown:

10% Code compiles. (ghc Main.hs does not error.)
10% Tests compile. (ghci Tests.hs does not error.)
10% Clean code.
- Why? Well, read a few of these quotes. You spend much more time reading code than writing it.
- There will be some leniency, because there are exceptions to every rule (e.g. your instructor’s solution has one long line), but in general think about how to make your code easier to read. Keep the following in mind.
- Type definitions on all top-level functions.
- Understandable names. Naming is one of the two hard problems in computer science. Short names are fine for common uses: i for index, x:xs for generic array elements, n for generic number, c for counter, etc. Abbreviations are fine too, but the rule is I should know the meaning of the thing just by reading its name. I tend tend to err on the side of longer names.
- Avoid unnecessary code, e.g. pattern match cases that could be combined, or deconstructing a value and then just putting it back together.
- No commented-out code.
- No “spaghetti code” (long, hard-to-follow, or deeply nested functions that should be broken up). Take advantage of where and let.
- Comments explaining any hard-to-understand logic, e.g. what does this helper function do?
- Avoid long lines (>100 characters).
- Tasteful use of whitespace for vertical alignment.
- η-reduction is not necessary, but can be tasteful.
40% Do the provided tests pass? Are they sufficient to demonstrate the code’s correctness?
- 10% eval (and any helper functions)
- 15% tokenize (and any helper functions)
- 15% parse (and any helper functions)
30% I/O testing. Does the compiled program produce the correct output given various input strings?

Extra Credit

For extra credit, implement the additional files Lab3Rational.hs and TestLab3Rational.hs which work with rational numbers. The inputs should be the same expressions but the internal values and the output are now rational numbers: so that the output for .75 would be 3 / 4. TestLab3Rational.hs should also throw an error properly if there is a division by zero. Ensure that the rational numbers in the output are in lowest terms.

This extra credit is worth at most 10%.