Lecture 11: Monoid and Foldable

Monoid

We have seen how the foldr function provides a general way to "crunch" a list of values down to a single result. A few simple examples:


    > foldr (+) 0 [1,2,3,4,5]
    15
    > foldr (*) 1 [1,2,3,4,5]
    120
    > foldr (:) [] [1,2,3,4,5]
    [1,2,3,4,5]
    > foldr (++) [] [[1,1],[2,2],[3,3],[4,4],[5,5]]
    [1,1,2,2,3,3,4,4,5,5]
    > foldr (&&) True [True,True,True,False]
    False
    > foldr (||) False [True,True,True,False]
    True
    > foldr (||) True [True,True,True,False]
    True

We will now consider Haskell typeclasses that (a) describe types, beyond just lists, that can be crunched down to a single result, and (b) describe values that can result from this crunching process. We will start with the latter.

The Monoid typeclass (defined in Data.Monoid) consists of types that have an associative binary operation with identity,


    class Monoid m where
        mempty :: m
        mappend :: m -> m -> m
        
        mconcat :: [m] -> m
        mconcat = foldr mappend mempty

where mempty is the identity, and mappend is the associative binary operator. An instance of Monoid must define at least mempty and mappend. Notice that because this signature refers to the variable m in places where values are required (and, furthermore, because only ground types are inhabited by values), the kind of m is *. That is, only ground types (as opposed to type operators) can be Monoids.

The names mempty and mappend work well for the List instance, which we will see below, but not as well for many other Monoids whose identities and operators have little to with "emptiness" or "appending." Nevertheless, these are the names and we will learn to live with them. Somewhat recently, the operator (<>) was defined as a synonym for mappend, which helps mitigate the naming issue.

The monoid laws can be expressed as, for all values a, b, and c of a monoid m,

a <> (b <> c) == (a <> b) <> c (associativity),
a <> mempty == a (right identity), and
mempty <> a == a (left identity).

Exercise 11.1

Among the seven example calls to foldr above, which binary operators and identities (the first and second arguments to foldr, respectively) do not constitute monoids, according to the definitions and laws just discussed?

List

The List instance is straightforward and illustrates why the Monoid methods were so named:


    instance Monoid [a] where
        mempty  = []
        mappend = (++)

Notice how, based on the surrounding context, Haskell infers what the types of mempty and (<>) should be and retrieves the implementations from the List instance appropriately:


    > [1,2] <> mempty <> [3,4,5] <> [6]
    [1,2,3,4,5,6]
    > foldr mappend mempty [[1,2],[],[3,4,5],[6]]
    [1,2,3,4,5,6]
    > mconcat [[1,2],[],[3,4,5],[6]]
    [1,2,3,4,5,6]

Sum and Product

There are two useful ways of defining a monoid on numbers: (+) paired with the identity element 0 and (*) paired with the identity element 1. However, if we were to define an instance of the form


    instance Num a => Monoid (Num a)
        ...

we could represent only one of these two monoids. To get around this obstacle, Data.Monoid defines two wrapper types for numbers, called Sum and Product, which capture the different monoids on numbers:


    newtype Sum a = Sum { getSum :: a } deriving (...)
    
    instance Num a => Monoid (Sum a) where
        mempty                         = Sum 0
        Sum x `mappend` Sum y          = Sum (x + y)

    newtype Product a = Product { getProduct :: a } deriving (...)
    
    instance Num a => Monoid (Product a) where
        mempty                         = Product 1
        Product x `mappend` Product y  = Product (x * y)

We can now choose between the two Monoids by explicitly wrapping and unwrapping numbers (without any additional run-time overhead, due to the use of newtype in the type definitions):


    > getSum . mconcat . map Sum $ [1..6]
    21
    > getProduct . mconcat . map Product $ [1..6]
    720

Any and All

Similarly to numbers, there are two Monoids on booleans, which are defined by way of two wrapper types:


    newtype All = All { getAll :: Bool } deriving (...)
    
    instance Monoid All where
        mempty                 = All True
        All x `mappend` All y  = All (x && y)
    
    newtype Any = Any { getAny :: Bool } deriving (...)

    instance Monoid Any where
        mempty                 = Any False
        Any x `mappend` Any y  = Any (x || y)

The following examples exhibit the same functionality as calling the all id and any id functions from the Prelude:


    > getAll . mconcat . map All $ []
    True
    > getAny . mconcat . map Any $ []
    False
    > getAll . mconcat . map All $ [True,True,True,False]
    False
    > getAny . mconcat . map Any $ [True,True,True,False]
    True

Maybe, First, and Last

There are many other useful Monoids, besides the common ones on lists, numbers, and booleans above. For example, the following derived instance "lifts" all Monoids over a to Monoids over Maybe a:


    instance Monoid a => Monoid (Maybe a) where
        mempty                     =  Nothing
        Nothing `mappend` m        =  m
        m       `mappend` Nothing  =  m
        Just m1 `mappend` Just m2  =  Just (m1 `mappend` m2)

Furthermore, it is often useful to work with Monoids over Maybe types that return the first and last non-Nothing values, if any, in a list of Maybe values. The First and Last wrapper types are defined to provide each of these two choices, and can be used as follows:


    > getFirst . mconcat . map First $ [Just 1, Nothing, Just 3]
    Just 1
    > getLast . mconcat . map Last $ [Just 1, Nothing, Just 3]
    Just 3

Exercise 11.2

Without first peeking at the implementations in Data.Monoid, fill in the definitions below.


    newtype First a = First { getFirst :: Maybe a } deriving (Show)
    newtype Last a  = Last  { getLast  :: Maybe a } deriving (Show)
    
    instance Monoid (First a) where
        ...
    
    instance Monoid (Last a) where
        ...

Foldable

The fold-right function for crunching lists


    foldr :: (a -> b -> b) -> b -> [a] -> b

is great, except that it only crunches lists. (Note: the latest version of Haskell's standard libraries have generalized the type of foldr, but up until now we have ignored it and pretended that it was specialized to lists.) What about other data structures? Binary trees, for example, can be crunched much in the same way as lists:


    data BinaryTree a
        = Leaf a
        | Node (BinaryTree a) (BinaryTree a)
        deriving Show
    
    foldTree :: (a -> b -> b) -> b -> BinaryTree a -> b
    foldTree f acc (Leaf x) = f x acc
    foldTree f acc (Node l r) = foldTree f (foldTree f acc r) l

Alas, the only difference between functions that fold over lists, trees, and other such data structures is the structure that houses the data!

The Foldable typeclass in Data.Foldable abstracts over the common pattern of folding, but with a perhaps unexpected twist. In the following (partial) definition of Foldable,


    class Foldable t where
        fold    :: Monoid m => t m -> m
        foldMap :: Monoid m => (a -> m) -> t a -> m
        foldr   :: (a -> b -> b) -> b -> t a -> b
        ...

notice that, in addition to foldr as expected, there are fold and foldMap methods that work with Monoids specifically. The minimal complete definition for Foldable is either foldMap or foldr, so, in fact, the structure described by Monoid is closely linked with the structure described by Foldable.

This class definition is quite informative, so we will pause and think hard about it before moving on to its instances. It is no surprise that every Foldable type t (more specifically, a type operator t of kind * -> *) must come with a foldr function of type (a -> b -> b) -> b -> t a -> b. But how can methods like fold and foldMap, which are parameterized over Monoids, be implemented using foldr?

Let us first consider foldMap and its type, which requires that its first parameter have type (a -> m), where a and m are ground types (of kind *) and m is a Monoid. It may seem strange at first to expect that a caller of foldMap could provide such a function that relates two seemingly unrelated types a and m. The key to remember is that the context of a call may instantiate these two type variables to more specific types, and there are many pairs of types a and m (where m is a Monoid) for which we can define functions of type (a -> m), such as:

[Bool] -> [Bool]
Int -> Sum Int
Double -> Product Double
Bool -> Any
Bool -> All
Maybe String -> Maybe String
Maybe String -> First String
Maybe String -> Last String

by making use of the Monoid instances we have seen.

Exercise 11.3

Write functions that satisfy each of the types above.

So, the type of foldMap indicates that it will use a mapping function to convert a data structure of a values into a data structure of m values. Next, let us fill in the following definition of foldMap in terms of foldr:


    -- foldMap :: Monoid m => (a -> m) -> t a -> m
       foldMap f t = foldr foo init t
           where foo  = ???  -- foo  :: (a -> m -> m)
                 init = ???  -- init :: m

The comments we have written to the right of foo and init indicate the types that these expressions must have in order for the call to foldr to be well-typed and return a value of type m, as required. Because we know m is a Monoid, we can fill in


    init = mempty

and


    foo = \x acc -> f x <> acc
        = \x acc -> (<>) (f x) acc
        = \x -> (<>) $ f x
        = \x -> ((<>) . f) x
        = (<>) . f

Our implementation, after one last eta-reduction, is thus:


       foldMap f = foldr ((<>) . f) mempty

Cool! With foldMap now under our belt, it is should be easy to swallow that fold = foldMap id.

We have now convinced ourselves that foldMap and fold can be implemented using foldr. But foldr must also be derivable in terms of foldMap, because the latter constitutes an acceptable minimal complete definition. The way this works is also quite nifty, making using of the endofunctor monoid on function composition which we have not discussed here. If you're curious, check out the Endo monoid and the default implementations in Foldable to see how this works.

Now that we have thought long and hard about how Foldable works, we will now consider some of its instances. The instance declaration for BinaryTree is easy to fill via foldr


    import qualified Data.Foldable as F

    instance F.Foldable BinaryTree where
        foldr = foldTree

and is even easier (assuming, for the sake of argument, that the implementation of foldTree is inlined above) via foldMap:


    import qualified Data.Foldable as F

    instance F.Foldable BinaryTree where
        foldMap f (Leaf x)   = f x
        foldMap f (Node l r) = F.foldMap f l <> F.foldMap f r

In the import statements above, notice that (a) the as F clauses allows F to stand as shorthand for Data.Foldable, and (b) the qualified keyword requires that all definitions imported from Data.Foldable be named with explicit prefixes. Because many Foldable methods share names with functions exported by Prelude, the qualified keyword avoids introducing ambiguities. (Note: the latest version of the standard libraries have fully embraced Foldable, so the qualified import is no longer necessary. Try removing the qualification and the F. prefix from these code snippets.)

It is often the case that Foldable types are also Functors, in which case we can implement foldMap f as fold . fmap f, half of which is already done. Consider the following, assuming that a Functor BinaryTree instance has been provided:


    instance F.Foldable BinaryTree where
        foldMap f = F.fold . fmap f
        fold (Leaf x) = x
        fold (Node l r) = fold l <> fold r

It does not get much simpler than this, which is good, because having to manually implement foldr for every Foldable type is more error prone. In fact, when we first wrote the Foldable BinaryTree instance, we got it wrong. In particular, we wrote a buggy implementation of foldTree which processed the left subtree first rather than the right. Such a function implements foldl rather than foldr, breaking the expected interface for Foldables. In contrast, there is little room for error in the implementation of foldMap or fold above.

*Exercise 11.4

Implement Foldable instances for each of the following types by providing implementations (only) for foldMap:

[]
Maybe
Either a
(,) a

*Exercise 11.5

Define a new wrapper type for pairs called FstPair using newtype, and define a Foldable instance for it that corresponds to folding over the first element of pairs, rather than the second. Your solution should result in the following behavior:


    > foldMap (++ "!") ("hello", "greetings")
    "greetings!"
    > foldMap (++ "!") (FstPair ("hello", "greetings"))
    "hello!"

Exercise 11.6 It's often very nice to have functions that can convert from one data representation to another. For container types (like our BinaryTree), it's sometimes nice to be able repackage the contained elements into a list. The Foldable class includes a member function (with a default implementation) called


    toList :: Foldable t => t a -> [a]

that does exactly this. Since we've created an instance of Foldable for BinaryTree, this means that


    > toList $ (Leaf 1 `Node` Leaf 2) `Node` (Leaf 3 `Node` Leaf 4)
    [1,2,3,4]

Works as expected. Provide your own implementation of toList, and compare it to the implementation in Data.Foldable.

CMSC-16100

Honors Introduction to Programming, I