Mutable State

When programming in other languages, assignment (a.k.a. mutation) can be used to achieve several programming goals, such as performance, convenience, and sloppiness. Consider how far we've come without the need to rely on mutation.

Haskell does, however, provide mechanisms for mutation, albeit in a manner distinct from most other languages. In a nutshell, Haskell tracks the "extent" or "scope" of mutable references, using the type system to ensure that they do not "escape" their "computational threads." As we will see, this allows mutation to be used locally without allowing them to leak outside of the kinds of nice, pure functional boundaries we have grown to know and love.

Example: Fibonacci

As a very simple example, implement Fibonacci naively:

fib_slow :: Integer -> Integer
fib_slow 1 = 1
fib_slow 2 = 1
fib_slow n = fib_slow (n-1) + fib_slow (n-2)

> map fib_slow [1..30]     -- slow
> map fib_slow [1..100]    -- hopeless

A much more efficient version keeps track of previous two answers:

fib_fast :: Integer -> Integer
fib_fast 1 = 1
fib_fast 2 = 1
fib_fast n = foo 1 1 (n-2) where
  foo i _ 0 = i
  foo i j m = foo (i+j) i (m-1)

> map fib_fast [1..100]
> map fib_fast [1..1000]

Even though we don't need to, for sloppiness, let's implement this using mutable variables (like we probably would in C or Java).

ST (State Threads)

There are two components to mutable references in Haskell:

  1. A "thread" of computation, denoted by Control.Monad.ST s a, where s is the (abstract) "id" of a thread and a is the type of value produced; and

  2. A reference cell of type Data.STRef s a points to a memory cell that stores a value of type a and s is the thread id.

Might help to remember more informative type variable names: ST id a and STRef id a.

Unlike the "stateful functions" Control.Monad.State s a ~= s -> (a, s) we studied before, this "state monad" really is stateful.

ST s is a Monad, so we can sequence ST s a actions.

fib_impure :: Integer -> Integer
fib_impure 1 = 1
fib_impure 2 = 1
fib_impure n = runST $ do
  x <- newSTRef 1        -- every cell initialized to some value
  y <- newSTRef 1

  replicateM_ (n-2) $ do   -- review Control.Monad!
     i <- readSTRef x
     j <- readSTRef y
     writeSTRef x j
     writeSTRef y (i+j)

  readSTRef y

> map fib_impure [1..30]
> map fib_impure [1..300]
> map fib_impure [1..300] == map fib_fast [1..300]

Okay, so how do these functions work? Notice that {new,read,write}STRef return values "in the ST monad."

newSTRef   :: a -> ST s (STRef s a)
readSTRef  :: STRef s a -> ST s a
writeSTRef :: STRef s a -> a -> ST s ()
runST      :: (forall s. ST s a) -> a

Whoa, explicit forall s. .... How is this type different than usual?

Rank-1 (Prenex) Polymorphism

All of the polymorphic functions we've seen so far are rank-1, or prenex, polymorphic: all of the forall quantifiers are (implicitly) written at the outermost position.

As an aside, there is an option that allows writing the quantifiers explicitly:

> :set -XExplicitForAll

> :t id :: forall a. a -> a
> :t id :: forall b. b -> b

By restricting to prenex polymorphism, the type checker is able to automatically infer polymorphic types and and type applications completely automatically for a (perhaps surprisingly-) large class of functions. But there are other functions where prenex polymorphism is not enough.

For example:

-- foo :: ([a] -> b) -> Bool -> b
-- foo :: forall a. forall b. ([a] -> b) -> Bool -> b
   foo :: forall a b. ([a] -> b) -> Bool -> b
   foo f True  = f "abcd"
   foo f False = f [1,2]

This does not type check: foo cannot be assigned the declared prenex polymorphic type below, because foo uses f at type [Char] -> b (in the first call) and at type Num n => [n] -> b (in the second call). f only has one type, and it is not either of these.

Higher-Rank Polymorphism

If we move the a quantifier inside, however, it typechecks! (Writing foralls anywhere but all the way to the left in a type requires the flag RankNTypes.).

foo :: forall b. (forall a. [a] -> b) -> Bool -> b

By default, all quantifiers are at "the top-level" of a type cannot appear to the left on arrow. "Higher-rank" types allow quantifiers to be nested inside, to the left of arrows

      all                            all
      / \                            /  \
     a   all                        b  ------>
        /   \                         /       \
       b  ------>                   all       ->
         /       \                  / \      /  \
       ->        ->                a  ->   Bool   b
      /  \      /  \                 /  \
     [a]  b   Bool  b               [a]  b

We can call foo with any functions of type (forall a. [a] -> b). For example:

> foo length True
> foo length False
> foo (const 0) True
> foo (const 0) False

A prenex forall allows (requires) the caller to choose the type arguments, so language/compiler cannot make any assumptions about choice.

But a forall to the left of an arrow prevents the caller from making any assumptions about the type variable. Used to encode encapsulation (e.g. objects). Allows compiler to make decisions about representation.

Back to ST

The higher-rank polymorphic type

runST :: (forall s. ST s a) -> a

prevents output from depending at all on the thread; it has to work for all thread "ids" s.

This approach allows us to "locally" use mutation. But only pure values at our function boundaries.

For example:

bad :: Bool
bad = runST $ do
  x <- newSTRef True
  pure $ runST $ do
    writeSTRef x False  -- inner thread can't access vars from outer
  readSTRef x

Even the following fails "because type variable 's' would escape its scope" (akin to something like "x :: exists s. STRef s Bool").

bad' :: ()
bad' =
  let x = runST $ newSTRef True in
  ()

Can define aliases:

aliasEx = runST $ do
  x <- newSTRef True
  let y = x
  writeSTRef y False
  readSTRef x

References prevented from "escaping" scope. So can use references internally (e.g. within a function), but must be pure at boundaries. Best of both worlds! Haskell can enforce this, whereas most type systems cannot either (a) cannot prevent references from crossing boundaries (b) require different types depending on implementation.

"Impure" features do exist in Haskell, but they are to be used with restraint and caution:

Source Files