Lecture 5: Type Classes
Haskell, as a programming language, has a somewhat unusual history. Most programming language designs are driven by small groups, or even single individuals, at their formative stages, and driven by a fairly focused set of computational problems. Lisp was driven by John McCarthy's misunderstandings of Church's λ-calculus, and his desire for a programming language in which programs could naturally be represented as data objects in the language. FORTRAN was developed by a small team of programming language experts at IBM working under the direction of John Backus, and focussed on scientific computing. COBOL was largely the work of Admiral Grace Mary Hopper, and was driven by the data processing needs (at the time, mostly business/logistical needs) of the US Navy. SNOBOL was largely the work of Ralph Griswold, and was intended to facilitate quick-and-dirty manipulations of string data, much as Larry Wall's Perl does today. And so it goes.
There have been a few languages that were designed by committees, and even fewer that have been successful. Algol 60 and Haskell are exemplars. In the case of Haskell, there were a large number of programming researchers in the early 1980's all working on the problem of lazy evaluation, and they each tended to have their own language, some of which are still in use. A very real problem was that none of these languages was gaining the kind of wide-spread adoption necessary to gain much attention. And so the process that lead to Haskell was driven by a desire to create a shared language that would attract a critical mass of users, and moreover, which would allow a direct comparison of some of the implementation ideas then current. And so a Haskell committee was formed to create a common tongue for this work.
There is an excellent history of Haskell, A History of Haskell: Being Lazy With Class that was written by a few core committee members.
One of the goals was a fairly conservative design, which relied on well-tested ideas. Type classes, which are widely viewed as one of the most distinctive and interesting aspects of Haskell's design, represented an almost casual breach of this goal! The basic design came out of a conversation between Phil Wadler and Joe Fasel, as a way of dealing with the problem of numeric types and operator overloading. Wadler suggested the addition of type classes in an email message to the committee's email list, and adopted without much discussion or reflection. Amazing!
Defining and Instantiating Type Classes
Type classes are defined by the class
keyword, and consist of a sequence of declarations, associated with optional default implementations. E.g., the Eq
type class is defined as follows:
class Eq a where
(==), (/=) :: a -> a -> Bool
x /= y = not (x == y)
x == y = not (x /= y)
In this definition, the variable a
represents the type of an instance of the Eq
class. Instances of Eq
are required to provide both the (==)
and (/=)
functions. This type class definition also includes default definitions for the two operators, which have the effect that a programmer needs only provide an instance of (==)
or of (/=)
. If one of the definitions is omitted, the default definition will be used instead.
For example, let's consider a simple key-value pair,
data Entry = Entry String String -- key, value
as might be used in a dictionary. We can make Entry
an instance of Eq
by providing an instance definition, e.g.,
instance Eq Entry where
Entry key1 value1 == Entry key2 value2 = key1 == key2 && value1 == value2
In this case, the omitted definition will be provided by default.
Haskell also allows for deriving instances, which allow type class instances to propagate through type hierarchies, e.g.,
data Pair a b = Pair a b
instance (Eq a, Eq b) => Eq (Pair a b) where
Pair a1 b1 == Pair a2 b2 = a1 == a2 && b1 == b2
A similar idea allows Haskell to derive the Eq
type class for a new ADT whenever the constituent types are also instances of Eq
, and we can invoke this automatic machinery by the deriving
clause in the definition of an ADT.
Part of what makes type classes interesting in Haskell is both the analogy and dis-analogy with the classes of (probably more familiar) object-oriented language. In both cases, we're providing a common vocabulary and somewhat similar type checking behavior, and so the programming use cases are often remarkably similar. But there is an important difference: Typical object-oriented languages rely on run-time (dynamic) selection of code that corresponds to a particular member/message, typically through pointers, stored in the object itself, to type-specific dispatch tables. Haskell is able to use its static type analysis to select the appropriate instance at compile time, saving itself both time and space at runtime.
A Few Standard Type Classes
Eq
As we've just seen, Eq
is a type class which includes types with a meaningful notion of equality.
Type classes interact with the type system via constraints on the types of free variables in a type expression. For example, consider the Data.List
function elem
, which determines whether a value occurs in a list:
> 3 `elem` [1,2,3,2,3,4]
True
It's easy to implement elem
via a natural recursion:
elem _ [] = False
elem x (a:as)
| x == a = True
| otherwise = elem x as
or, more concisely
elem _ [] = False
elem x (a:as) = (x == a) || elem x as
relying on the left-biased definition of (||)
.
It should be clear that the definition of elem
relies on the list elements supporting an equality test, and this reflected in the type of elem
, via a type constraint:
elem :: Eq a => a -> [a] -> Bool
In recent versions of Haskell, elem
has an even more general type, which we'll get to later in the lecture.
Ord
Haskell's Prelude
defines an Ord
type class:
class (Eq a) => Ord a where
compare :: a -> a -> Ordering
(<), (<=), (>), (>=) :: a -> a -> Bool
max, min :: a -> a -> a
...
The documentation identifies compare
as a minimal complete definition, meaning that default implementations are set up in a way such that all other type class functions have default definitions that are ultimately grounded in compare
. Minimal complete definitions need not be unique, although setting up multiple minimal complete definitions is tricky.
This definition comes with a constraint: All instances of type class Ord
must also be instances of type class Eq
.
*Exercise 5.1 Read the documentation on Ord
, and then provide a parsimonious instance of Ord
for suitably type-constrained instances of Pair
.
Number Types
Numeric types are described by the Num
class, as we've seen before.
class Num a where
(+), (-), (*) :: a -> a -> a
negate :: a -> a
abs :: a -> a
signum :: a -> a
...
Num
is the basis of a hierarchy of various numeric types. For example, Int
and Integer
are instances of the Integral
subclass of Num
, and Float
and Double
are instances of the Fractional
subclass of Num
.
We will not delve much into this numerical tower, but you can learn more by reading the documentation, this tutorial, or by using the :info
command in ghci
to poke around:
> :info Num
...
> :info Fractional
...
> :info Int
...
Enum
Haskell provides concise syntax for enumerating values of enumerable types:
> [1..10]
[1,2,3,4,5,6,7,8,9,10]
> ['a'..'z']
"abcdefghijklmnopqrstuvwxyz"
> [2,4..10]
[2,4,6,8,10]
> ['a','e'..'u']
"aeimqu"
The Enum
type class describes types that can be enumerated
(i.e. indexed from 0
to n-1
):
class Enum a where
toEnum :: Int -> a
fromEnum :: a -> Int
...
Show
The Show
class describes types whose values can be
converted to String
s:
class Show a where
show :: a -> String
...
As we have seen, Haskell can auto-generate Show
instances:
data Pair x y = Pair x y
deriving (Eq, Show)
> Pair "Hello" 161
Pair "Hello" 161
Exercise 5.2 Instead of deriving the default Show
instance for Pair
, define one that produces the following:
> Pair "Hello" 161
( "Hello" , 161 )
Read
The Read
class is complementary to Show
:
class Read a where
...
read :: Read a => String -> a
The read
function parses a String
to produce a value of the given type. Together, read
ing
and show
ing values allow programs to
interact with the outside world (e.g. String
s from
user input, the file system, and network requests).
We will have much more to say about parsing and
the outside world later.
Foldable
The Foldable
type class is interesting, and it has an interesting history. Let's consider []
. We know that []
has two data constructors, []
and (:)
. We can define a list function by a recursion on this type, which typically involves a definition like the one we gave for enum
earlier in the lecture:
-- | A predicate for list membership
elem :: Eq a => a -> [a] -> Bool
elem _ [] = False
elem x (a:as)
| x == a = True
| otherwise = elem x as
Historically, one of the significant list-based functions was foldr
, which we can think of as a function for defining functions by recursion over lists:
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr combine base [] = base
foldr combine base (c:cs) = combine c (foldr combine base cs)
We can then reimplement many of our list based functions in terms of calls to foldr
, thinking of it as a function that takes two arguments and produces a function as a result, e.g.,
sum :: Num n => [n] -> n
sum = foldr (+) 0
product :: Num n => [n] -> n
product = foldr (*) 1
Or even
elem :: Eq a => a -> [a] -> Bool
elem a = foldr (\c cs -> a == c || cs) False
Having written this, the code fairy (we'll get to know the code fairy and her ways through the rest of this course), taps imperiously on our shoulder. Since I know her, I know that she's telling me that we can do better. Indeed, we can:
\c cs -> a == c || cs = \c cs -> (||) (a == c) cs
= \c -> (||) (a == c)
= \c -> (||) ((a ==) c)
= \c -> ((||) . (a ==)) c
= (||) . (a ==)
so
elem a = foldr ((||) . (a ==)) False
We'll pass on eliminating the a
.
The revelation comes from the observation that there are other data types we might want to fold into a single summary value. A good example is kind of binary tree that holds its values at its inner nodes:
data BinaryTree a
= EmptyTree
| Node a (BinaryTree a) (BinaryTree a) -- value leftChild rightChild
We can think of a binary tree as consisting of a sequence of values, ordered in infix, i.e., where all of the values in the left child of a node come before the value it carries, and these are followed by the values held by the right children. We can made this explicit via afunction, that produces a list with this order, i.e.,
visit :: BinaryTree a -> [a]
visit EmptyTree = []
visit (Node a left right) = visit left ++ [a] ++ visit right
Thus, for example if we convert the following
into a BinaryTree
test :: BinaryTree Integer
test = Node 3
(Node 1
(Node 0 EmptyTree EmptyTree)
(Node 2 EmptyTree EmptyTree))
(Node 5
(Node 4 EmptyTree EmptyTree)
(Node 6 EmptyTree EmptyTree))
We'd have
> visit test
[0,1,2,3,4,5,6]
as expected.
Interestingly enough, we can implement foldr
for BinaryTree
as well. There's a trick here, in that a list has a “right sentinel,” i.e., []
, and there is no corresponding right sentinel for a BinaryTree
. We'll get around this by a rather slick trick, which is to introduce a parameter that represents the value of processing stuff to the right of the tree, and introducing a helping function that uses this value:
foldr :: (a -> b -> b) -> b -> BinaryTree a -> b
foldr combiner base tree = foldMap combiner tree base
foldMap :: (a -> b -> b) -> BinaryTree a -> b -> b
foldMap _ EmptyTree rest = rest
foldMap combiner (Node a left right) rest
= (foldMap combiner left (combiner a (foldMap combiner right rest)))
This double recursion is more complicated than what we've seen before, but it's easy enough to understand. Moreover, looking at the last clause, we can see the setup for composition-based η-reduction, i.e.,
foldMap combiner (Node a left right) rest
= (foldMap combiner left . combiner a . foldMap combiner right) rest
whence, by η-reduction,
foldMap combiner (Node a left right)
= foldMap combiner left . combiner a . foldMap combiner right
A tactical problem is that when we define a function by pattern matching, as we're doing here with foldMap
, all of the clauses have to take the same number of pattern arguments. Thus, η-reducing the last clause forces us to η-reduce the first:
foldMap _ EmptyTree rest = rest
We can't just cancel rest
from the left- and right-hand sides of the equation, as this leaves an empty right-hand side, and that's not valid. Fortunately, the identity function comes to our rescue, as it often does:
foldMap _ EmptyTree rest = rest
foldMap _ EmptyTree rest = id rest
foldMap _ EmptyTree = id
In fact, there turn out to a number of types similar to BinaryTree
and []
, which we can think of as (implicitly) ordered containers of values of another type, which are suitable for folding a la foldr
. And so, foldr
grew up, and instead of being an ordinary function on lists, because the canonical instance function of new type class:
class Foldable f where
foldr :: (a -> b -> b) -> b -> f a -> b
...
Note that the type variable f
is now a complex object, in particular, it is now a type constructor, which it turns out is also a type. Thus, if we interrogate the type of foldr
, we'll get:
> :t foldr
foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b
Moreover, the function foldr
is minimally complete for Foldable
, which brings a number of other useful functions along with it, so we can write
-- | A module for handling binary trees which store values at their internal nodes
module BinaryTree where
data BinaryTree a
= EmptyTree
| Node a (BinaryTree a) (BinaryTree a)
instance Foldable BinaryTree where
foldr combiner base tree = foldMap combiner tree base where
foldMap _ EmptyTree rest = rest
foldMap combiner (Node a left right) rest = (foldMap combiner left (combiner a (foldMap combiner right rest)))
test :: BinaryTree Integer
test = Node 3
(Node 1
(Node 0 EmptyTree EmptyTree)
(Node 2 EmptyTree EmptyTree))
(Node 5
(Node 4 EmptyTree EmptyTree)
(Node 6 EmptyTree EmptyTree))
This comes with a lot of nice consequences. The functions we re-wrote to take advantage of foldr
, back when we were thinking of it as a function for simplifying the definition of list-based functions, suddenly can take as arguments values of any type t
that belongs to the Foldable
type class, including BinaryTree
. Thus
> sum test
10
just works, because the Prelude
defines sum
via foldr
and not directly in terms of pattern matching on list data constructors. Indeed,
visit :: Foldable t => t a -> [a]
visit = foldr (:) []
Is easy to implement in this more general context, and moreover, the toList
function of the Foldable
type class has exactly this as a default definition, so we don't even need to write the function to be able to use it, all we have to do is include Data.Foldable
.
*Exercise 5.3 After writing
foldr combiner base tree = foldMap combiner tree base
I suddenly felt a tapping on my shoulder, and didn't need to turn around to imagine the expression on the face of the code fairy. It was that, “I can't believe that I need to call your attention to the code you just wrote.” You know what to do: η-reduce. A lot. Do it.