Lecture 3: Abstract Data Types

$ \newcommand{\abs}[1]{\left\lvert#1\right\rvert} $

Haskell provides a rich collection of atomic types to the programmer. There are floating-point types Double and Float, integer types Integer (infinite precision) and Int (machine precision, these days, 64-bit), a character type Char (unicode) and Char8 (ASCII), etc.

Naturally, the type system is extensible—we can and often do introduce new types within our programs. One of the principle mechanisms for doing this is abstract data types (ADTs).

Today, we'll take a look at some (simplified versions) of predefined ADTs. You shouldn't try to redefine these. Bad things will happen. Very bad things.

Simple Algebraic Data Types

The simplest Algebraic Data Type (ADT) is


    data ()
        = ()
        deriving (Eq,Ord,Show)

This is the oddly opaque “unit” type. It is a type that has a single value, which happens to have the same name as it has (and an unusual name it is!). Despite seeming to offer nothing, () plays an important role in Haskell, and you'll see it leak out as an argument to a polymorphic type from time to time, e.g., main :: IO (). It may be useful to think of this as a 0-tuple. The deriving clause gives us default implementations of the Eq, Ord, and Show type classes, which work as expected.


    > :t ()
    () :: ()

Next up, we have the Bool type, for boolean values:


    data Bool
        = False
        | True
        deriving (Eq,Ord,Show)

This defines a data type that has two distinct values, True and False. It's just a bit! The Bool data type gets used a lot as predicates (condition tests) are naturally boolean valued, and so there are special operators and syntax devoted to Bool. The three standards boolean operators are defined via pattern matching definitions, much as we saw last lecture with lists.


    -- | Boolean negation.

    not :: Bool -> Bool
    not True  = False
    not False = True

    -- | Boolean conjunction, a.k.a., "and."

    (&&) :: Bool -> Bool -> Bool
    False && _ = False
    True  && y = y

    -- | Boolean disjunction, a.k.a., "or."

    (||) :: Bool -> Bool -> Bool
    True  || _ = True
    False || y = y

This brings us to an important rule of Haskell syntax. Each non-operator constructor (and type) begins with a capital letter, whereas variables (i.e., non-operator function names) start with a lower-case letter.

Boolean conjunction a && b will be True only if a and b are both True, and likewise boolean disjunction a || b will be False only if a and b are both False.

There is some special syntax associated with Bool. Consider the following predicate definitions:


    -- | A predicate for even Integers.

    even :: Integer -> Bool
    even n = mod n 2 == 0

    -- | A predicate for odd Integers.

    odd :: Integer -> Bool
    odd n = not (even n)

The even function will return True for even n, and False for odd n. Note that == is the equality predicate and is a function in the Eq type class, as distinct from = which is used in definitions. (Ordinary mathematical notation often conflates these two distinct meanings.) We can use predicates to give different defining equations for based on different values of its arguments. Note that this is different from pattern matching, which makes distinctions based on the structure of the arguments. Consider

$$\abs{x} = \begin{cases} x,& \text{if $x \geq 0$,}\\ -x,& \text{otherwise} \end{cases}$$

the familiar absolute value function for real numbers. We could define


    -- | The absolute value function

    abs x
        | x >= 0 = x
        | otherwise = -x

were it not for the fact that abs is defined in Prelude as a member of the Num type class. Let's ignore that nuance for now, and consider the definition via guarded equations. Here, rather than having a sequence of equations that involve patterns, we have a sequence of alternative definitions associated with guards, i.e., boolean predicates of the arguments. Evaluation occurs by considering each guarded equation in the order they appear in the definition, and using the first equation whose guard evaluates to True. Haskell also supports an if ... then ... else ... construct, so we could have defined


    abs x = if x >= 0
            then x
            else -x

Haskell syntax doesn't get along with unary operators, and so it's often necessary to write (-x), but we're ok with the bare unary negation here (because it is preceded by else, which is a keyword and not an ordinary variable). You might wonder why there are different syntaxes for what are basically the same thing. The guard syntax can be used as a way of modifying pattern matches, and yes, a simple variable is a pattern, albeit one that matches everything, whereas an if ... then ... else ... can appear anywhere in an expression. This is similar to the distinction we saw with which vs. let.

*Exercise 3.1 Consider the $\mathrm{collatz}$ function defined as follows:

$$ \mathrm{collatz}(n) = \begin{cases} 1, &\hbox{if $n=1$} \\ 1+\mathrm{collatz}(n/2), &\hbox{if $n$ is even} \\ 1+\mathrm{collatz}(3n+1), &\hbox{otherwise} \end{cases} $$

Give Haskell definitions of collatz using guards, and collatz' using if ... then ... else .... [We can use apostrophes in variable names!] Note that you should use div rather than (/) to divide integral values.

Another simple ADT is Ordering,


    data Ordering = LT | EQ | GT
        deriving (Eq,Ord,Show)

Which is used as the return type of the compare :: Ord a => a -> a -> Ordering function of the Ord type class.

A next step in complexity is polymorphic ADTs. These are ADTs that take one or more type variables, e.g., the pair type, which could be written as

  
    -- | The Pair data type

    data Pair a b = Pair a b

But since tuples arise frequently in programming practice, it is convenient to have a special, terse and familiar notation for them, specifically, (a,b), and Haskell supports this. Somewhat counter-intuitively, this leads to the following definition:


    -- | The 2-tuple type

    data (,) a b = (,) a b

This is notationally a bit opaque, but (,) is (by a mild abuse of notation) just the prefix form of pairing operator “,.” As a type, this is a simple abstraction of a value that contains values of two other types. This is often useful, e.g., when we want to return multiple results from a single function, e.g., the divMod returns a pair consisting of the quotient and remainder of a division.


    > :t divMod
    divMod :: Integral a => a -> a -> (a, a)
    > divMod 10 3
    (3,1)

There are 3-tuples, 4-tuples, all the way through 62-tuples. Note that tuples support Eq, Ord, and Show instances when their constituent types do, via deriving instances.

Exercise 3.2 The Prelude provides some simple functions for dealing with pairs, fst and snd for extracting components, and curry and uncurry for swizzling between functions that expect two arguments, either separately, or packaged together via a pair.

Unfortunately, analogous functions do not exist for 3-tuples, etc. Code a Haskell module Triple.hs which provides analogous functions fstOf3, sndOf3, thirdOf3, curry3, and uncurry3.

Polymorphic ADTs

Let's consider a simple example of a polymorphic type:


    data Maybe a
        = Nothing
        | Just a
        deriving (Eq,Ord,Show)

We can think of Maybe a as a type that contains 0 or 1 a values. “Maybe wrapped types” often come up in the context of error handling, e.g., we might use Just a to denote a computation that successfully completes with the value a, and Nothing to denote a computation that encountered some sort of error.

Let consider a fairly complex, but informative example. Consider the Num type class. Its instances have to provide (+), (-), and (*) (and a few other functions), but not (/). We might ask, why is (/) different? There are at least a couple of answers. One is that we expect division to work a bit differently with floating-point rather than integral types, and indeed Haskell provides three different functions, (/), div, and quo for division, the first for ordinary floating point division, the other two for integer division with different rounding behavior. But there's another reason, which is that (/) isn't total. We have to worry about division by zero.

By default, division by zero generates an exception, which is an ugly, heavy-weight control structure in Haskell, and beyond this class. But we can approach the problem differently using Maybe, using Nothing as a NaN, a value that is not a valid number. With some processor configurations, divisions by zero in floating point arithmetic result in NaN (not a number), but there is no native NaN for integral types.

As we'll see, we can use Maybe wrapped integral types to “compute” an entire expression, and then check the result for an error. If the result is Nothing, then some sort of error occurred, whereas if it is Just a, then the computation succeeded with the value a, and we can use ordinary code (e.g., a pattern match, rather than an exception handler) to distinguish between the two. To that end, let's consider some code that is much more complicated than we've seen before:


    -- | Wrapping a number in Maybe

    module MaybeNum where
    
    -- | Derived instance definition for Num (Maybe n) given Num n.
    
    instance Num n => Num (Maybe n) where
        Just a + Just b = Just $ a + b
        _ + _ = Nothing
    
        Just a - Just b = Just $ a - b
        _ - _ = Nothing
    
        Just a * Just b = Just $ a * b
        _ * _ = Nothing
    
        negate (Just a) = Just $ negate a
        negate _ = Nothing
    
        abs (Just a) = Just $ abs a
        abs _ = Nothing
    
        signum (Just a) = Just (signum a)
        signum _ = Nothing
    
        fromInteger i = Just $ fromInteger i

The instance construct will make a Maybe n an instance of the Num type class, so long as n is an instance of Num itself. The Num type class has quite a few functions we have to implement. The $ operators is just application, albeit right associative and of low precedence, as opposed to the usual implicit space, which is left associative and of highest precedence. Using $ as we do here spares us a few parentheses. This is a “deriving instance,” as it defines instances for a whole lot of types and not just one. This is something that you can't do with a Java interface.

This seems peculiar, but it works, e.g.,


    > 1 + 3 * 3 :: Maybe Integer
    Just 10

Why might we want such an instance declaration? What does it gain us? Patience.


    -- | safe division
    
    infixl 7 //
    
    (//) :: (Eq n, Integral n) => Maybe n -> Maybe n -> Maybe n
    Just a // Just b
        | b == 0 = Nothing
        | otherwise = Just $ div a b

First we define a new infix name, (//), which will have the same fixity and precedence as ordinary division. Then, we implement our new (//) operator as integer division, including a denominator check which returns Nothing if the denominator equals zero. The Integral type class adds div, and the Eq type class added (==).


    > 1 + 2 // 0
    Nothing

This is what is gained: We've now have modified types that allow for errors to propagate seamlessly through the usual process of arithmetic expression evaluation, and so don't require exception handling or other exotic flow control to deal with if they occur. If you've programmed in Swift, this may remind you of Swift's optional types, and the way they can propagate errors through call chains. Of course, Swift appropriated the concept from Haskell (somewhat unusually, with credit given), and not the other way around.

Exercise 3.3 We could have approached this example by creating a deriving instance Integral n => Integral (Maybe n), as div is part of the Integral type class. But this would involve implementing several other type classes. Explore the documentation, to determine what type classes are involved, and what functions they contain.

Recursive ADTs

You may have guessed that the list data type from last lecture is itself just an ADT provided by the Prelude. We can imagine that the list data type is defined as follows:


    infixr 5 :
    
    data [] a
        = []
        | a : [] a

The prefix use of [] as a type constructor is a bit unusual, but follows a pattern that we saw with tuples. We've seen the usual syntax Haskell uses for list types in the last lecture, i.e.,, the more familiar [a], but it's important to be aware of this more primitive form.

Part of what is significant about this definition is that it is recursive, i.e., the data-type is defined in terms of itself. This is a very useful facility, and as we've seen, one that can give rise to natural recursions in the functions that manipulate data of this type.

As with tuples, lists belong to the Eq, Ord and Show type classes when their underlying types do.

Lists are a widely used in Haskell to build a variety of types, e.g.,


    type String = [Char]

The type declaration, somewhat counter-intuitively, doesn't introduce a new type (that's what data is for), but instead a type alias, i.e., another name for a type. This particular representation choice for String means that we can use ordinary list-based functions for working with strings, e.g.,


    > length "foo"
    3

C programmers will immediately note a meaningful distinction: Unlike C's strlen function, length doesn't account for a terminating null character. It also means we can use the list (++) function to concatenate strings, e.g.,


    > "foo" ++ "bar"
    "foobar"

We haven't looked at the (++) function before, and this is a natural time to do so:


    -- | Append two lists.
    
    infixr 5 ++
    
    (++) :: [a] -> [a] -> [a]
    [] ++ bs = bs
    (a:as) ++ bs = a : (as ++ bs)

Note here that both (++) and (:) are infixr 5, and so the parentheses in this definition are not strictly speaking necessary, but this depends on the associativity of the operators, and not their precedence per se, and so is more fragile than usual. Adding the parentheses seems prudent: it doesn't detract from clarity, and clarifies the programmer's intent.

A particularly useful data structure, built out of tuples and lists, is an association list, [(a,b)]. Association lists are often used as a simple implementation of dictionaries, where a is the type of the key, and b is the type of the value, or gloss.

The following function is defined in the Prelude, and it shows how several of the language features we've been using work together:


    -- | Look up a value by key from an association list, returning a Maybe wrapped result.
    
    lookup :: Eq a => a -> [(a,b)] -> Maybe b
    lookup _ [] = Nothing
    lookup a ((k,v):ps)
        | a == k    = Just v
        | otherwise = lookup a ps

The design choice here, using Maybe to wrap a result so that we have a way to deal with the “key not present” problem isn't the only choice possible. Other possibilities involve returning an unwrapped value of type b, and deal with the possibility that the key isn't present in a different way:

If we're confident that the key will always be there, we could throw an exception if it's missing. This gives us the simplest code at the call site, but involves some programming risk.
We could provide a default value to be returned as an argument to a modified lookup function.
We could modify the representation of a dictionary, providing a default value.

Let's implement all three. For the sake of simplicity, we'll use our base lookup function in all three cases.


    -- | Look up a value by key from an association list, throwing an error if the key is missing.
    
    lookupWithError :: Eq a => a -> [(a,b)] -> b
    lookupWithError a dict = case lookup a ps of
        Nothing => error "key not found"
        Just v => v
    
    -- | Lookup a value by key from an association list, returning a default value on missing key.
    
    lookupWithDefault :: Eq a => a -> b -> [(a,b)] -> b
    lookupWithDefault a b dict = case lookup a dict of
        Nothing => b
        Just v => v
    
    -- | A dictionary with default value:
    
    data Dictionary a b = Dictionary b [(a,b)]
    
    -- | Lookup a value by key from a Dictionary, returning the Dictionary's default value
    --   on missing key.
    
    lookupInDictionary :: Eq a => a -> Dictionary a b -> b
    lookupInDictionary a (Dictionary default dict) = case lookup a dict of
        Nothing => default
        Just value => value

Note here that case is an expression that enables pattern matching. Although we don't use it here, guarded equations can be used with the patterns in a case statement just as they are used in definitions.

Exercise 3.4 Implement the three functions lookupWithError, lookupWithDefault, and lookupInDictionary by direct recursions, i.e., without calling lookup.

Exercise 3.5 A common data structure is a rose tree. This is a kind of tree in which each node holds a value of a particular type. The actual declarations are a bit different (they rely on Haskell's record syntax, which we'll see in due course), but they amount to:


    -- | A rose tree.
    
    data Tree a = Node a (Forest a)
    type Forest a = [Tree a]

Note that recursion can be mutual, and need not be direct.

A tree consists of a node, which has two constituents: the value of type a, and a list of children.

Rose trees are often used to represent semi-structured data, e.g., an outline, or an XML infoset.

Write a function preorder :: Tree a -> [a] which returns the values contained in a Tree as a list, based on a preorder traversal (i.e., the value at a node comes before the values at its children). It may be helpful to know about the function concat :: [[a]] -> [a], which flattens a list of lists into a simple list. (Note that the actual type of concat is just a bit more general than this.)

CMSC-16100

Honors Introduction to Programming, I

Autumn Quarter, 2017

Lecture 3: Abstract Data Types

Simple Algebraic Data Types

Polymorphic ADTs

Recursive ADTs