Lecture 11: Monads
Monads
Let's review the bidding. We've seen two category theory inspired type classes:
class Functor f where
fmap :: (a -> b) -> f a -> f b
and
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
These type classes reflect common patterns of type usage, and a language that is we find helpful is that of plain types like a
, vs. fancy types like f a
. Thus, a Functor
must implement a function fmap
that enables us to transform a function on plain types to a function on fancy types. The Applicative
function pure
enables us to promote a plain value to a fancy value. Finally, (<*>)
can be thought of as “apply for fancy things,” or, if we think of it as a unary function in the same way we think of fmap
as a unary function, a means to distribute fanciness over (->)
, converting a fancified function into a plain function whose domain and range are fancy values.
Our next type pattern is Monad
, boogie-man of Haskell. But as we'll see, monads are are just instances of a pattern of type use that occurs a lot in programming. We'll start with an abridged definition:
class Applicative m => Monad m where
(>>=) :: m a -> (a -> m b) -> m b
This looks a bit odd, but we can see in this a kind of “application written backwards,” where we take a fancy value, and a fancifying function that maps plain values to fancy values, and some how “apply” the latter to the former, obtaining a fancy value. There are reasons why these things are the way they are.
These peculiar functions are sometimes called “actions,” and (>>=)
serves as a sequencing mechanism. It is natural, in such contexts, to describe a sequence of actions (beginning with a start value) as a pipeline, in which the actions take place in the order they're written. The operation (>>=)
is called bind, because in a common notation for describing sequenced actions in which the (>>=)
operator is cleverly elided, it's effect will be to establish the binding of a name to plain values extracted from fancy values.
There are other functions that belong to the Monad
type class,
but they have default definitions, and we can defer their consideration for the time being.
Identity
The Identity
instance of bind is very simple: given a fancy value, we simply remove it from its virtual box, obtaining a plain value, and apply our fancifying function to the result.
instance Monad Identity where
Identity a >>= f = f a
You can see that it's the fancifying function's responsibility to box its output.
Maybe
The Maybe
instance of bind is also very simple: a Just
value is handled like an Identity
value, whereas there's no plain value to be extracted from a Nothing
argument, and so we have a Nothing
result:
instance Monad Maybe where
Just a >>= f = f a
Nothing >>= f = Nothing
I think you'll agree, we haven't seen anything scary yet.
A Bit of History
Monads solved a crucial problem in the design of pure functional languages like Haskell, i.e., how to deal with IO, which is intrinsically impure. Monads provide an interface for defining a kind of “isolation ward” in which impure actions could be defined and performed, apart from the pure core of the language, but in contact with it. This is part of why monads became the boogie-man of Haskell, because a full accounting of something as simple as the standard “Hello, world!” program required this digression through category theory, and a very abstract type class.
As the language was first developed, Monad
was a stand-alone type class, i.e., it wasn't necessarily the case that instances m
of the Monad
type class belonged to Functor
, let alone Applicative
. In order to make the Monad
useful, other functions were included in the standard interface:
instance Monad m where
...
return :: a -> m a
fail :: String -> m a
We recognize that return
has the same type pattern as pure
, and indeed Monad
instances are required to satisfy the law return = pure
, and a default definition of return
is provided so that this is so. This results in a bit of confusion over lexicon: which should you use? The sense of the community is evolving, but we believe it will converge on the simple solution of always preferring pure
, and that's the style we'll teach. We have a lot of code that we'll need to convert. In any event, if you ever encounter a return
in the wild, read pure
, and code on!
The inclusion of fail
is more controversial. The monads of category theory do not have a fail
function, and there are many types which would otherwise be monads for which no sensible fail
function can be written. Indeed, the most common implementation of fail
is fail = error
, i.e., to raise an exception. This is just one of those cases where the motivation for including the monad abstraction (dealing with IO) resulted in a bad design decision. IO can fail, and there needs to be a way to deal with IO failures, but including fail
in Monad
was not the right choice.
The status quo is that a new type class has been defined,
instance Monad m => MonadFail m where
fail :: String -> m a
and that the fail
function of Monad
is now deprecated, and will be removed in the future. Caveat emptor!
A Bit of Law
There are several axioms that describe how pure
and (>>=)
interact, and Haskell makes optimizations based on the assumption that they do:
- left identity:
pure a >>= f
is equivalent tof a
- right identity:
ma >>= pure
is equivalent toma
- associativity:
(ma >>= f) >>= g
is equivalent toma >>= (\x -> f x >>= g)
These laws (especially associativity) seem a bit odd. There's a simple transformation that makes them more comprehensible. Category theorists describe functions of type a -> m b
where m
is a monad as Kleisli arrows (pronounced “KLAI-slee”). We will think of them as machines that take plain values as input and produce fancy values as output. A key notion is Kleisli composition, which is a means for composing this sort of machine. It's useful to directly compare the type patterns involved:
(.) :: (b -> c) -> (a -> b) -> a -> c
(<=<) :: Monad m => (b -> m c) -> (a -> m b) -> a -> m c
It is, by the way, quite easy for us to define (<=<)
, and worth considering in parallel with the definition of (.)
. We'll use (=<<)
here, an infrequently used flipped bind, to accentuate the analogy between the definition of (.)
and (<=<)
:
f . g = \x -> f $ g x
f <=< g = \x -> f =<< g x
In category theory, if we have a monad over a category, we can build the Kleisli category in which the objects are the objects of base category, the arrows from a
to b
are the Kleisli arrows of type a -> m b
, and the identity is pure
, and composition is (<=<)
. Thus, the ordinary axioms of category theory for this category are:
- left identity:
pure <=< f
is equivalent tof
- right identity:
f <=< pure
is equivalent tof
- associativity:
(f <=< g) <=< h
is equivalent tof <=< (g <=< h)
These are equivalent to the laws above (although the careful reader will note that the chirality of the identity laws is flipped), but much more intuitive.
As with the other category theoretic type classes, Haskell compilers are allowed to assume that the monad laws are satisfied by any Monad
, and to transform code accordingly. Naturally, all of the Monad
instances defined in the Haskell libraries do.
Lists
The []
is also an instance of Monad
, but in a more interesting way than Identity
or Maybe
. It is useful to start with a brief digression. Folks who are familiar with category theory via mathematics may be a bit perplexed by our presentation of monads. In the standard approach, the critical function is join :: Monad m => m (m a) -> m a
. The way we like to think about this is that the plain/fancy hierarchy really only has two levels: plain and fancy. If we try to create an “extra fancy” value, we can demote it to ordinary fanciness by simply applying join
to it.
The join
function is often fairly easy to understand. If we have a double layered structure, we can often delayer it, e.g.,
join :: Identity (Identity a) -> Identity a
join (Identity (Identity a)) = Identity a
join :: Maybe (Maybe a) -> Maybe a
join (Just (Just a)) = Just a
join _ = Nothing
If we consider join
on lists, we have:
join :: [[a]] -> [a]
There is already a very natural list function that has this type pattern:
concat :: [[a]] -> [a]
In this particular case, join
essentially “flattens” our list structure, forgetting the boundaries between sublists. There are a lot of monads which join
works this way, “erasing” interior boundaries.
There is something unreasonable about what we've just done, taking the type of something we're looking for, and then reaching into our magic hat, and pulling out a more familiar object that just happens to have the same type, and then claiming that that's what we've been looking for. What is surprising, in both mathematics and in type-driven development, is that it's often the case that the pieces fit together in essentially only one way. This has the perfectly unreasonable consequence that if we're looking for something that has a particular type, and there is only one way to build that object, then getting the types right is all we need to do.
We can use this idea to “discover” the relationship between join
and (>>=)
. Let's suppose we want to build (>>=)
out of join
. Our goal is to get to the type:
(>>=) :: (Monad m) => m a -> (a -> m b) -> m b
Recall fmap
:
fmap :: (Functor f) => (a -> b) -> f a -> f b
If we supply fmap
with an argument of type a -> m b
where m
is a monad, and we apply it to an argument of type f a
, we'll have
fmap :: (Monad m) => (a -> m b) -> m a -> m (m b)
which is pretty close. The arguments are in the wrong order, and have to somehow get from m (m b)
to m b
, but that's what join
is for. Thus, we guess
ma >>= f = join (f <$> ma)
It's a good guess.
Exercise 11.1 Show that (>>= id) :: Monad m => m (m a) -> m a
, and therefore that join = (>>= id)
is a plausible definition for join
in a Monad
. It is, in fact, the definition.
Putting all of this inspired, insane, guess work together, we have
instance Monad [] where
xs >>= f = concat (f <$> xs)
The actual definition of (>>=)
is equivalent, and perhaps even a bit more intuitive.
instance Monad [] where
xs >>= f = [y | x <- xs, y <- f x]
Thus, (>>=)
on lists is actually fairly simple. We consider the elements x
of the argument list xs
one at a time, for each such element we form f x
, a list, which constitutes a part of the value we're constructing. We then combine these sublists by concatenation.
One caution in this: the join
function is an ordinary function, and not part of the Monad
type class. This is unfortunate, as it means there is no default definition of (>>=)
in terms of join
, but also that our approach of re-defining join
will often entail explicit namespace specifications.
Making Monads Work for Us
We've invested some effort into monads, their definition, and a few of their instances. But we haven't yet used them to solve any programming problems. Let's take a look at a bit of code:
pure 1 >>= \x ->
pure (x + x) >>= \x ->
pure (x * 2) >>= \x ->
pure x
This can be evaluated (if you give the type checker enough information to know that you're working in with Identity
as the monad), and returns Identity 4
.
This is a bit confusing, especially as to why anyone would want to do something so convoluted, but this has a lot to do with the fact that the glue code associated with the Identity
monad is trivial. More complicated examples are coming, as are some meaningful syntactic simplifications. In interpreting this, it is important to understand that >>=
has low precedence, so this is really:
pure 1 >>= (\x ->
pure (x + x) >>= \(x ->
pure (x * 2) >>= (\x ->
pure x)))
I.e., each lambda's body extends all the way to the bottom. We've laid this out so that each line represents a deeper level of lexical nesting. This syntax is certainly awkward, but keep in mind we're building up a machinery for contexts that are more interesting than simple transparent boxes. Moreover, Haskell has a special syntax for working with monads, the semi-familiar do
, which is nothing more than syntactic sugar for expressions built out of (>>=)
. Consider the following, which is just a do
-sugared variant of the expressions above:
do
x <- pure $ 1
x <- pure $ x + x
x <- pure $ x * 2
return x
This makes our expression look a lot like assignment-based procedural code that is so familiar to C and Java programmers, with just a bit of extra syntactic noise. And we can think about it that way, too, although that's not what actually is happening. We're not making a sequence of assignments to a single variable x
, instead we're establishing a “new” x
on each line, whose scope extends to the next x
, and we're binding a value to it.
Thus, it is possible to write code that looks imperative, but with a functional meaning and interpretation, in which sequencing is a metaphor for lexical nesting, and so makes it easy for us to use and reason about deeper levels of lexical nesting than we'd otherwise attempt.
The structure of do
notation is fairly straight forward. Let's suppose that the value we're constructing has type m a
for some monad m
. Then every line of the do
will be comprised of a value of type m b
for some b
, optionally including the binding of a name via the <-
syntax. Consecutive lines are combined using (>>=)
when there is an explicit binding, and (>>)
when there is not, where
(>>) :: Monad m => m a -> m b -> m b
ma >> mb = ma >>= \_ -> mb
is just a binding that doesn't retain the value of its argument.
Let's use this with []
. Consider the word “Mississippi.” Kids love this word, because of the repetition of letters, and the rhythm of spelling it out. We might even think of using this repetition as a way of compressing the spelling, e.g., representing it by a value of type [(Int,Char)]
where each pair is a count together with a letter. Thus,
mississippiCompressed :: [(Int,Char)]
mississippiCompressed
= [(1,'M'),(1,'i'),(2,'s'),(1,'i'),(2,'s'),(1,'i'),(2,'p'),(1,'i')]
Consider now the job of decompressing such a representation, of recovering the original string.
decompress :: [(Int,a)] -> [a]
decompress ps = do
(i,x) <- ps
a <- replicate i x
pure a
This uses the replicate :: Int -> a -> [a]
function from Data.List
, which is also in the prelude.
We could have approached this as a list comprehension problem, and there's a clear relationship between list comprehension and the list monad
decompress ps = [a | (i,x) <- ps, a <- replicate i x]
But one advantage of the monadic approach is that we'll soon develop a number of skills for working with monadic definitions, and we can apply them even here. One such observation is that constructions like
do
...
a <- expr
pure a
Can be rewritten as
do
...
expr
This is just a consequence of the right identity law for monads, after applying an η-reduction on \a -> pure a
. Thus,
decompress ps = do
(i,x) <- ps
replicate i x
Next, we can recognize that \(i,x) -> replicate i x
is just uncurry replicate
, which allows us to write:
decompress ps = ps >>= curry replicate
or even
decompress = (>>= uncurry replicate)
Next, let's consider an earlier exercise that we solved with list comprehension, generating all of the primitive Pythagorean triples up to a particular hypotenuse:
primitiveTriples n =
[ (a,b,c)
| a <- [1..n]
, b <- [a+1..n]
, gcd a b == 1
, c <- [b+1..n]
, a^2 + b^2 == c^2
]
To translate this to do
notation, we simply replicate each of expressions to the right of the bar as a line in the do
, and move the expression to the left to the bottom:
primTriples n = do
a <- [1..n]
b <- [a+1..n]
gcd a b == 1
c <- [b+1..n]
a^2 + b^2 == c^2
pure (a,b,c)
This doesn't quite work, because the lines that correspond to tests don't have a monadic value, but rather a boolean value. We want to write a function guard :: Bool -> [()]
that allows us to complete the translation as:
primTriples n = do
a <- [1..n]
b <- [a+1..n]
guard $ gcd a b == 1
c <- [b+1..n]
guard $ a^2 + b^2 == c^2
pure (a,b,c)
Writing guard
will test our understanding of the list monad actually works. There is no explicit binding on the guarded lines, and we know that we're returning a list of ()
's. Recalling the definition of (>>)
, the body below the guarded line will be run once for each element of that list. Thus, we want the guarded line to have the value [()]
if the predicate we're testing is true, and []
otherwise. Thus,
guard :: Bool -> [()]
guard True = [()]
guard False = []
There's a cute trick, known as the Fokker trick after the programmer who invented it, where
guard b = [() | b]
In this case, there are no generators, but the effect of the definitions is to produce a single ()
if b
is True
, and none otherwise. The []
type belongs to many other type classes, including Alternative
, which we'll meet in a future lecture. There is a function guard :: Alternative f => Bool -> f ()
of which our guard
is just a type-restricted special case.
*Exercise 11.2 Consider the trivial two-element list [(),()]
. Because this is an element of the list monad, we can include it on any line of list-defining do
expression. Consider the two statements:
do
[(),()]
x <- [1,2,3]
pure x
vs
do
x <- [1,2,3]
[(),()]
pure x
These produce very different values. Explain the difference. The first expression can be re-written, using the techniques described above, into a particularly simple form that does not involve do
. Do so.
Optionally: if you're feeling especially brave, the second form can be re-written in the same way, albeit not quite so simply. Do so.