Lecture 3: Abstract Data Types
\( \newcommand{\abs}[1]{\left\lvert#1\right\rvert} \)Haskell provides a rich collection of atomic types to the programmer. There are floating-point types Double
and Float
, integer types Integer
(infinite precision) and Int
(machine precision, these days, 64-bit), a character type Char
(unicode) and Char8
(ASCII), etc.
Naturally, the type system is extensible—we can and often do introduce new types within our programs. One of the principle mechanisms for doing this is abstract data types (ADTs).
Today, we'll take a look at some (simplified versions) of predefined ADTs. You shouldn't try to redefine these. Bad things will happen. Very bad things.
Simple Algebraic Data Types
The simplest Algebraic Data Type (ADT) is
data ()
= ()
deriving (Eq,Ord,Show)
This is the oddly opaque “unit” type. It is a type that has a single value, which happens to have the same name as it has (and an unusual name it is!). Despite seeming to offer nothing, ()
plays an important role in Haskell, and you'll see it leak out as an argument to a polymorphic type from time to time, e.g., main :: IO ()
. It may be useful to think of this as a 0-tuple. The deriving
clause gives us default implementations of the Eq
, Ord
, and Show
type classes, which work as expected.
> :t ()
() :: ()
Next up, we have the Bool
type, for boolean values:
data Bool
= False
| True
deriving (Eq,Ord,Show)
This defines a data type that has two distinct values, True
and False
. It's just a bit! The Bool
data type gets used a lot as predicates (condition tests) are naturally boolean valued, and so there are special operators and syntax devoted to Bool
. The three standards boolean operators are defined via pattern matching definitions, much as we saw last lecture with lists.
-- | Boolean negation.
not :: Bool -> Bool
not True = False
not False = True
-- | Boolean conjunction, a.k.a., "and."
(&&) :: Bool -> Bool -> Bool
False && _ = False
True && y = y
-- | Boolean disjunction, a.k.a., "or."
(||) :: Bool -> Bool -> Bool
True || _ = True
False || y = y
This brings us to an important rule of Haskell syntax. Each non-operator constructor (and type) begins with a capital letter, whereas variables (i.e., non-operator function names) start with a lower-case letter.
Boolean conjunction a && b
will be True
only if a
and b
are both True
, and likewise boolean disjunction a || b
will be False
only if a
and b
are both False
.
There is some special syntax associated with Bool
. Consider the following predicate definitions:
-- | A predicate for even Integers.
even :: Integer -> Bool
even n = mod n 2 == 0
-- | A predicate for odd Integers.
odd :: Integer -> Bool
odd n = not (even n)
The even
function will return True
for even n
, and False
for odd n
. Note that ==
is the equality predicate and is a function in the Eq
type class, as distinct from =
which is used in definitions. (Ordinary mathematical notation often conflates these two distinct meanings.) We can use predicates to give different defining equations for based on different values of its arguments. Note that this is different from pattern matching, which makes distinctions based on the structure of the arguments. Consider
the familiar absolute value function for real numbers. We could define
-- | The absolute value function
abs x
| x >= 0 = x
| otherwise = -x
were it not for the fact that abs
is defined in Prelude
as a member of the Num
type class. Let's ignore that nuance for now, and consider the definition via guarded equations. Here, rather than having a sequence of equations that involve patterns, we have a sequence of alternative definitions associated with guards, i.e., boolean predicates of the arguments. Evaluation occurs by considering each guarded equation in the order they appear in the definition, and using the first equation whose guard evaluates to True
. Haskell also supports an if ... then ... else ...
construct, so we could have defined
abs x = if x >= 0
then x
else -x
Haskell syntax doesn't get along with unary operators, and so it's often necessary to write (-x)
, but we're ok with the bare unary negation here (because it is preceded by else
, which is a keyword and not an ordinary variable). You might wonder why there are different syntaxes for what are basically the same thing. The guard syntax can be used as a way of modifying pattern matches, and yes, a simple variable is a pattern, albeit one that matches everything, whereas an if ... then ... else ...
can appear anywhere in an expression. This is similar to the distinction we saw with which
vs. let
.
*Exercise 3.1 Consider the $\mathrm{collatz}$ function defined as follows:
$$ \mathrm{collatz}(n) = \begin{cases} 1, &\hbox{if $n=1$} \\ 1+\mathrm{collatz}(n/2), &\hbox{if $n$ is even} \\ 1+\mathrm{collatz}(3n+1), &\hbox{otherwise} \end{cases} $$Give Haskell definitions of collatz
using guards, and collatz'
using if ... then ... else ...
. [We can use apostrophes in variable names!] Note that you should use div
rather than (/)
to divide integral values.
Another simple ADT is Ordering
,
data Ordering = LT | EQ | GT
deriving (Eq,Ord,Show)
Which is used as the return type of the compare :: Ord a => a -> a -> Ordering
function of the Ord
type class.
A next step in complexity is polymorphic ADTs. These are ADTs that take one or more type variables, e.g., the pair type, which could be written as
-- | The Pair data type
data Pair a b = Pair a b
But since tuples arise frequently in programming practice, it is convenient to have a special, terse and familiar notation for them, specifically, (a,b)
, and Haskell supports this. Somewhat counter-intuitively, this leads to the following definition:
-- | The 2-tuple type
data (,) a b = (,) a b
This is notationally a bit opaque, but (,)
is (by a mild abuse of notation) just the prefix form of pairing operator “,
.” As a type, this is a simple abstraction of a value that contains values of two other types. This is often useful, e.g., when we want to return multiple results from a single function, e.g., the divMod
returns a pair consisting of the quotient and remainder of a division.
> :t divMod
divMod :: Integral a => a -> a -> (a, a)
> divMod 10 3
(3,1)
There are 3-tuples, 4-tuples, all the way through 62-tuples. Note that tuples support Eq
, Ord
, and Show
instances when their constituent types do, via deriving instances.
Exercise 3.2 The Prelude
provides some simple functions for dealing with pairs, fst
and snd
for extracting components, and curry
and uncurry
for swizzling between functions that expect two arguments, either separately, or packaged together via a pair.
Unfortunately, analogous functions do not exist for 3-tuples, etc. Code a Haskell module Triple.hs
which provides analogous functions fstOf3
, sndOf3
, thirdOf3
, curry3
, and uncurry3
.
Polymorphic ADTs
Let's consider a simple example of a polymorphic type:
data Maybe a
= Nothing
| Just a
deriving (Eq,Ord,Show)
We can think of Maybe a
as a type that contains 0 or 1 a
values. “Maybe
wrapped types” often come up in the context of error handling, e.g., we might use Just a
to denote a computation that successfully completes with the value a
, and Nothing
to denote a computation that encountered some sort of error.
Let consider a fairly complex, but informative example. Consider the Num
type class. Its instances have to provide (+)
, (-)
, and (*)
(and a few other functions), but not (/)
. We might ask, why is (/)
different? There are at least a couple of answers. One is that we expect division to work a bit differently with floating-point rather than integral types, and indeed Haskell provides three different functions, (/)
, div
, and quo
for division, the first for ordinary floating point division, the other two for integer division with different rounding behavior. But there's another reason, which is that (/)
isn't total. We have to worry about division by zero.
By default, division by zero generates an exception, which is an ugly, heavy-weight control structure in Haskell, and beyond this class. But we can approach the problem differently using Maybe
, using Nothing
as a NaN
, a value that is not a valid number. With some processor configurations, divisions by zero in floating point arithmetic result in NaN
(not a number), but there is no native NaN
for integral types.
As we'll see, we can use Maybe
wrapped integral types to “compute” an entire expression, and then check the result for an error. If the result is Nothing
, then some sort of error occurred, whereas if it is Just a
, then the computation succeeded with the value a
, and we can use ordinary code (e.g., a pattern match, rather than an exception handler) to distinguish between the two. To that end, let's consider some code that is much more complicated than we've seen before:
-- | Wrapping a number in Maybe
module MaybeNum where
-- | Derived instance definition for Num (Maybe n) given Num n.
instance Num n => Num (Maybe n) where
Just a + Just b = Just $ a + b
_ + _ = Nothing
Just a - Just b = Just $ a - b
_ - _ = Nothing
Just a * Just b = Just $ a * b
_ * _ = Nothing
negate (Just a) = Just $ negate a
negate _ = Nothing
abs (Just a) = Just $ abs a
abs _ = Nothing
signum (Just a) = Just (signum a)
signum _ = Nothing
fromInteger i = Just $ fromInteger i
The instance
construct will make a Maybe n
an instance of the Num
type class, so long as n
is an instance of Num
itself. The Num
type class has quite a few functions we have to implement. The $
operators is just application, albeit right associative and of low precedence, as opposed to the usual implicit space, which is left associative and of highest precedence. Using $
as we do here spares us a few parentheses. This is a “deriving instance,” as it defines instances for a whole lot of types and not just one. This is something that you can't do with a Java interface
.
This seems peculiar, but it works, e.g.,
> 1 + 3 * 3 :: Maybe Integer
Just 10
Why might we want such an instance declaration? What does it gain us? Patience.
-- | safe division
infixl 7 //
(//) :: (Eq n, Integral n) => Maybe n -> Maybe n -> Maybe n
Just a // Just b
| b == 0 = Nothing
| otherwise = Just $ div a b
First we define a new infix name, (//)
, which will have the same fixity and precedence as ordinary division. Then, we implement our new (//)
operator as integer division, including a denominator check which returns Nothing
if the denominator equals zero. The Integral
type class adds div
, and the Eq
type class added (==)
.
> 1 + 2 // 0
Nothing
This is what is gained: We've now have modified types that allow for errors to propagate seamlessly through the usual process of arithmetic expression evaluation, and so don't require exception handling or other exotic flow control to deal with if they occur. If you've programmed in Swift, this may remind you of Swift's optional types, and the way they can propagate errors through call chains. Of course, Swift appropriated the concept from Haskell (somewhat unusually, with credit given), and not the other way around.
Exercise 3.3 We could have approached this example by creating a deriving instance Integral n => Integral (Maybe n)
, as div
is part of the Integral
type class. But this would involve implementing several other type classes. Explore the documentation, to determine what type classes are involved, and what functions they contain.
Recursive ADTs
You may have guessed that the list data type from last lecture is itself just an ADT provided by the Prelude
. We can imagine that the list data type is defined as follows:
infixr 5 :
data [] a
= []
| a : [] a
The prefix use of []
as a type constructor is a bit unusual, but follows a pattern that we saw with tuples. We've seen the usual syntax Haskell uses for list types in the last lecture, i.e.,, the more familiar [a]
, but it's important to be aware of this more primitive form.
Part of what is significant about this definition is that it is recursive, i.e., the data-type is defined in terms of itself. This is a very useful facility, and as we've seen, one that can give rise to natural recursions in the functions that manipulate data of this type.
As with tuples, lists belong to the Eq
, Ord
and Show
type classes when their underlying types do.
Lists are a widely used in Haskell to build a variety of types, e.g.,
type String = [Char]
The type
declaration, somewhat counter-intuitively, doesn't introduce a new type (that's what data
is for), but instead a type alias, i.e., another name for a type. This particular representation choice for String
means that we can use ordinary list-based functions for working with strings, e.g.,
> length "foo"
3
C programmers will immediately note a meaningful distinction: Unlike C's strlen
function, length
doesn't account for a terminating null character. It also means we can use the list (++)
function to concatenate strings, e.g.,
> "foo" ++ "bar"
"foobar"
We haven't looked at the (++)
function before, and this is a natural time to do so:
-- | Append two lists.
infixr 5 ++
(++) :: [a] -> [a] -> [a]
[] ++ bs = bs
(a:as) ++ bs = a : (as ++ bs)
Note here that both (++)
and (:)
are infixr 5
, and so the parentheses in this definition are not strictly speaking necessary, but this depends on the associativity of the operators, and not their precedence per se, and so is more fragile than usual. Adding the parentheses seems prudent: it doesn't detract from clarity, and clarifies the programmer's intent.
A particularly useful data structure, built out of tuples and lists, is an association list, [(a,b)]
. Association lists are often used as a simple implementation of dictionaries, where a
is the type of the key, and b
is the type of the value, or gloss.
The following function is defined in the Prelude
, and it shows how several of the language features we've been using work together:
-- | Look up a value by key from an association list, returning a Maybe wrapped result.
lookup :: Eq a => a -> [(a,b)] -> Maybe b
lookup _ [] = Nothing
lookup a ((k,v):ps)
| a == k = Just v
| otherwise = lookup a ps
The design choice here, using Maybe
to wrap a result so that we have a way to deal with the “key not present” problem isn't the only choice possible. Other possibilities involve returning an unwrapped value of type b
, and deal with the possibility that the key isn't present in a different way:
- If we're confident that the key will always be there, we could throw an exception if it's missing. This gives us the simplest code at the call site, but involves some programming risk.
- We could provide a default value to be returned as an argument to a modified
lookup
function. - We could modify the representation of a dictionary, providing a default value.
Let's implement all three. For the sake of simplicity, we'll use our base lookup
function in all three cases.
-- | Look up a value by key from an association list, throwing an error if the key is missing.
lookupWithError :: Eq a => a -> [(a,b)] -> b
lookupWithError a dict = case lookup a ps of
Nothing => error "key not found"
Just v => v
-- | Lookup a value by key from an association list, returning a default value on missing key.
lookupWithDefault :: Eq a => a -> b -> [(a,b)] -> b
lookupWithDefault a b dict = case lookup a dict of
Nothing => b
Just v => v
-- | A dictionary with default value:
data Dictionary a b = Dictionary b [(a,b)]
-- | Lookup a value by key from a Dictionary, returning the Dictionary's default value
-- on missing key.
lookupInDictionary :: Eq a => a -> Dictionary a b -> b
lookupInDictionary a (Dictionary default dict) = case lookup a dict of
Nothing => default
Just value => value
Note here that case
is an expression that enables pattern matching. Although we don't use it here, guarded equations can be used with the patterns in a case
statement just as they are used in definitions.
Exercise 3.4 Implement the three functions lookupWithError
, lookupWithDefault
, and lookupInDictionary
by direct recursions, i.e., without calling lookup
.
Exercise 3.5 A common data structure is a rose tree. This is a kind of tree in which each node holds a value of a particular type. The actual declarations are a bit different (they rely on Haskell's record syntax, which we'll see in due course), but they amount to:
-- | A rose tree.
data Tree a = Node a (Forest a)
type Forest a = [Tree a]
Note that recursion can be mutual, and need not be direct.
A tree consists of a node, which has two constituents: the value of type a
, and a list of children.
Rose trees are often used to represent semi-structured data, e.g., an outline, or an XML infoset.
Write a function preorder :: Tree a -> [a]
which returns the values contained in a Tree
as a list, based on a preorder traversal (i.e., the value at a node comes before the values at its children). It may be helpful to know about the function concat :: [[a]] -> [a]
, which flattens a list of lists into a simple list. (Note that the actual type of concat
is just a bit more general than this.)