Lecture 11: Monoid and Foldable
Monoid
We have seen how the foldr
function provides a general way
to "crunch" a list of values down to a single result.
A few simple examples:
> foldr (+) 0 [1,2,3,4,5]
15
> foldr (*) 1 [1,2,3,4,5]
120
> foldr (:) [] [1,2,3,4,5]
[1,2,3,4,5]
> foldr (++) [] [[1,1],[2,2],[3,3],[4,4],[5,5]]
[1,1,2,2,3,3,4,4,5,5]
> foldr (&&) True [True,True,True,False]
False
> foldr (||) False [True,True,True,False]
True
> foldr (||) True [True,True,True,False]
True
We will now consider Haskell typeclasses that
(a) describe types, beyond just lists, that can be crunched down to
a single result, and
(b) describe values that can result from this crunching process.
We will start with the latter.
The Monoid
typeclass (defined in Data.Monoid
)
consists of types that have an associative binary operation with identity,
class Monoid m where
mempty :: m
mappend :: m -> m -> m
mconcat :: [m] -> m
mconcat = foldr mappend mempty
where mempty
is the identity, and mappend
is the
associative binary operator. An instance of Monoid
must define at
least mempty
and mappend
.
Notice that because this signature refers to the variable m
in places where values are required (and, furthermore, because only ground
types are inhabited by values), the kind of m
is *
.
That is, only ground types (as opposed to type operators) can be
Monoid
s.
The names mempty
and mappend
work well for the
List instance, which we will see below, but not as well
for many other Monoid
s whose identities and operators have
little to with "emptiness" or "appending."
Nevertheless, these are the names and we will learn to live with them.
Somewhat recently, the operator (<>)
was defined as a synonym
for mappend
, which helps mitigate the naming issue.
The monoid laws can be expressed as, for all values
a
, b
, and c
of a monoid m
,
a <> (b <> c) == (a <> b) <> c
(associativity),a <> mempty == a
(right identity), andmempty <> a == a
(left identity).
Exercise 11.1
Among the seven example calls to foldr
above,
which binary operators and identities (the first and second
arguments to foldr
, respectively) do not
constitute monoids, according to the definitions and laws just
discussed?
List
The List instance is straightforward and illustrates why the
Monoid
methods were so named:
instance Monoid [a] where
mempty = []
mappend = (++)
Notice how, based on the surrounding context, Haskell infers
what the types of mempty
and (<>)
should be and retrieves the implementations from the List
instance appropriately:
> [1,2] <> mempty <> [3,4,5] <> [6]
[1,2,3,4,5,6]
> foldr mappend mempty [[1,2],[],[3,4,5],[6]]
[1,2,3,4,5,6]
> mconcat [[1,2],[],[3,4,5],[6]]
[1,2,3,4,5,6]
Sum and Product
There are two useful ways of defining a monoid on numbers:
(+)
paired with the identity element 0
and
(*)
paired with the identity element 1
.
However, if we were to define an instance of the form
instance Num a => Monoid (Num a)
...
we could represent only one of these two monoids.
To get around this obstacle, Data.Monoid
defines two wrapper
types for numbers, called Sum
and Product
,
which capture the different monoids on numbers:
newtype Sum a = Sum { getSum :: a } deriving (...)
instance Num a => Monoid (Sum a) where
mempty = Sum 0
Sum x `mappend` Sum y = Sum (x + y)
newtype Product a = Product { getProduct :: a } deriving (...)
instance Num a => Monoid (Product a) where
mempty = Product 1
Product x `mappend` Product y = Product (x * y)
We can now choose between the two Monoid
s by explicitly
wrapping and unwrapping numbers (without any additional run-time overhead,
due to the use of newtype
in the type definitions):
> getSum . mconcat . map Sum $ [1..6]
21
> getProduct . mconcat . map Product $ [1..6]
720
Any and All
Similarly to numbers, there are two Monoid
s on booleans,
which are defined by way of two wrapper types:
newtype All = All { getAll :: Bool } deriving (...)
instance Monoid All where
mempty = All True
All x `mappend` All y = All (x && y)
newtype Any = Any { getAny :: Bool } deriving (...)
instance Monoid Any where
mempty = Any False
Any x `mappend` Any y = Any (x || y)
The following examples exhibit the same functionality as calling
the all id
and any id
functions from the
Prelude
:
> getAll . mconcat . map All $ []
True
> getAny . mconcat . map Any $ []
False
> getAll . mconcat . map All $ [True,True,True,False]
False
> getAny . mconcat . map Any $ [True,True,True,False]
True
Maybe, First, and Last
There are many other useful Monoid
s, besides the common ones
on lists, numbers, and booleans above. For example, the following
derived instance "lifts" all Monoid
s over a
to Monoid
s over Maybe a
:
instance Monoid a => Monoid (Maybe a) where
mempty = Nothing
Nothing `mappend` m = m
m `mappend` Nothing = m
Just m1 `mappend` Just m2 = Just (m1 `mappend` m2)
Furthermore, it is often useful to work with Monoid
s
over Maybe
types that return the first and last
non-Nothing
values, if any, in a list of Maybe
values. The First
and Last
wrapper types
are defined to provide each of these two choices, and can be used
as follows:
> getFirst . mconcat . map First $ [Just 1, Nothing, Just 3]
Just 1
> getLast . mconcat . map Last $ [Just 1, Nothing, Just 3]
Just 3
Exercise 11.2
Without first peeking at the implementations in Data.Monoid
,
fill in the definitions below.
newtype First a = First { getFirst :: Maybe a } deriving (Show)
newtype Last a = Last { getLast :: Maybe a } deriving (Show)
instance Monoid (First a) where
...
instance Monoid (Last a) where
...
Foldable
The fold-right function for crunching lists
foldr :: (a -> b -> b) -> b -> [a] -> b
is great, except that it only crunches lists.
(Note: the latest version of Haskell's standard libraries have
generalized the type of foldr
, but up until now we have
ignored it and pretended that it was specialized to lists.)
What about other data structures? Binary trees, for
example, can be crunched much in the same way as lists:
data BinaryTree a
= Leaf a
| Node (BinaryTree a) (BinaryTree a)
deriving Show
foldTree :: (a -> b -> b) -> b -> BinaryTree a -> b
foldTree f acc (Leaf x) = f x acc
foldTree f acc (Node l r) = foldTree f (foldTree f acc r) l
Alas, the only difference between functions that fold over lists, trees, and other such data structures is the structure that houses the data!
The Foldable
typeclass in Data.Foldable
abstracts over the common pattern of folding, but with a perhaps
unexpected twist. In the following (partial) definition of
Foldable
,
class Foldable t where
fold :: Monoid m => t m -> m
foldMap :: Monoid m => (a -> m) -> t a -> m
foldr :: (a -> b -> b) -> b -> t a -> b
...
notice that, in addition to foldr
as expected,
there are fold
and foldMap
methods that work with Monoid
s
specifically. The minimal complete definition for Foldable
is either foldMap
or foldr
, so, in fact, the
structure described by Monoid
is closely linked with the
structure described by Foldable
.
This class definition is quite informative, so we will pause
and think hard about it before moving on to its instances.
It is no surprise that every Foldable
type t
(more specifically, a type operator t
of kind
* -> *
) must come with a foldr
function of
type (a -> b -> b) -> b -> t a -> b
.
But how can methods like fold
and foldMap
,
which are parameterized over Monoid
s, be
implemented using foldr
?
Let us first consider foldMap
and its type, which
requires that its first parameter have type (a -> m)
,
where a
and m
are ground types (of kind
*
) and m
is a Monoid
.
It may seem strange at first to expect that a caller of
foldMap
could provide such a function that relates
two seemingly unrelated types a
and m
.
The key to remember is that the context of a call may instantiate
these two type variables to more specific types, and there are
many pairs of types a
and m
(where
m
is a Monoid
)
for which we can define functions of type
(a -> m)
, such as:
[Bool] -> [Bool]
Int -> Sum Int
Double -> Product Double
Bool -> Any
Bool -> All
Maybe String -> Maybe String
Maybe String -> First String
Maybe String -> Last String
Monoid
instances we have seen.
Exercise 11.3
Write functions that satisfy each of the types above.
So, the type of foldMap
indicates that it will
use a mapping function to convert a data structure of a
values into a data structure of m
values.
Next, let us fill in the following definition of foldMap
in terms of foldr
:
-- foldMap :: Monoid m => (a -> m) -> t a -> m
foldMap f t = foldr foo init t
where foo = ??? -- foo :: (a -> m -> m)
init = ??? -- init :: m
The comments we have written to the right of foo
and
init
indicate the types that these expressions must have
in order for the call to foldr
to be well-typed and
return a value of type m
, as required. Because we
know m
is a Monoid
, we can fill in
init = mempty
and
foo = \x acc -> f x <> acc
= \x acc -> (<>) (f x) acc
= \x -> (<>) $ f x
= \x -> ((<>) . f) x
= (<>) . f
Our implementation, after one last eta-reduction, is thus:
foldMap f = foldr ((<>) . f) mempty
Cool!
With foldMap
now under our belt,
it is should be easy to swallow that fold = foldMap id
.
We have now convinced ourselves that foldMap
and
fold
can be implemented using foldr
.
But foldr
must also be derivable in terms of
foldMap
, because the latter constitutes an
acceptable minimal complete definition.
The way this works is also quite nifty, making using of
the endofunctor monoid on function composition which we
have not discussed here. If you're curious, check out the
Endo
monoid and the default implementations in
Foldable
to see how this works.
Now that we have thought long and hard about how Foldable
works, we will now consider some of its instances.
The instance declaration for BinaryTree
is easy
to fill via foldr
import qualified Data.Foldable as F
instance F.Foldable BinaryTree where
foldr = foldTree
and is even easier (assuming, for the sake of argument, that the
implementation of foldTree
is inlined above) via
foldMap
:
import qualified Data.Foldable as F
instance F.Foldable BinaryTree where
foldMap f (Leaf x) = f x
foldMap f (Node l r) = F.foldMap f l <> F.foldMap f r
In the import
statements above, notice that
(a) the as F
clauses allows F
to stand as
shorthand for Data.Foldable
, and
(b) the qualified
keyword requires that all
definitions imported from Data.Foldable
be named with
explicit prefixes.
Because many Foldable
methods share names with functions
exported by Prelude
, the qualified
keyword
avoids introducing ambiguities.
(Note: the latest version of the standard libraries have fully embraced
Foldable
, so the qualified import is no longer necessary.
Try removing the qualification and the F.
prefix from
these code snippets.)
It is often the case that Foldable
types are also
Functor
s, in which case we can implement
foldMap f
as fold . fmap f
, half of which is
already done. Consider the following, assuming that a
Functor BinaryTree
instance has been provided:
instance F.Foldable BinaryTree where
foldMap f = F.fold . fmap f
fold (Leaf x) = x
fold (Node l r) = fold l <> fold r
It does not get much simpler than this, which is good, because having
to manually implement foldr
for every Foldable
type is more error prone. In fact, when we first wrote the
Foldable BinaryTree
instance, we got it wrong.
In particular, we wrote a buggy implementation of foldTree
which processed the left subtree first rather than the
right. Such a function implements foldl
rather
than foldr
, breaking the expected interface for
Foldable
s. In contrast, there is little room for
error in the implementation of foldMap
or fold
above.
*Exercise 11.4
Implement Foldable
instances for each of the following
types by providing implementations (only) for foldMap
:
[]
Maybe
Either a
(,) a
*Exercise 11.5
Define a new wrapper type for pairs called FstPair
using
newtype
, and define a Foldable
instance for
it that corresponds to folding over the first element
of pairs, rather than the second. Your solution should result in
the following behavior:
> foldMap (++ "!") ("hello", "greetings")
"greetings!"
> foldMap (++ "!") (FstPair ("hello", "greetings"))
"hello!"
Exercise 11.6 It's often very nice to have functions that can
convert from one data representation to another. For container types
(like our BinaryTree
), it's sometimes nice to be able repackage
the contained elements into a list. The Foldable
class includes
a member function (with a default implementation) called
toList :: Foldable t => t a -> [a]
that does exactly this. Since we've created an instance of Foldable
for BinaryTree
, this means that
> toList $ (Leaf 1 `Node` Leaf 2) `Node` (Leaf 3 `Node` Leaf 4)
[1,2,3,4]
Works as expected. Provide your own implementation of toList
,
and compare it to the implementation in Data.Foldable
.