Lecture 27: Arrays and Mutability

Immutable Arrays

Up until now, when we've needed an indexed container class, we've tended to use Data.Map, whereas most of the world tends to use arrays. Surprisingly, Haskell has arrays too, but it's a complicated story.

Let's start by considering a simple problem, drawn from Abelson & Sussman's wonderful “Structure and Interpretation of Computer Programs.” How many ways can you make change for a dollar, using only half-dollars, quarters, dimes, nickles, and pennies? Note that order isn't significant, only the number of each type of coin, thus, making change using a half-dollar and two quarters is the same as using two quarters and a half doller. In this case, it's useful to approach the problem in greater generality, writing the function

countChange :: [Int] -> Int -> Int countChange coins amount | amount == 0 = 1 | amount < 0 = 0 | otherwise = case coins of [] -> 0 c:cs -> countChange coins (amount-c) + countChange cs amount

Briefly, you can only make change for the amount of 0 in one way (with no coins!), and you can't make change for a negative amount. If you have at least one coin, and a positive amount of money to make change for, you can either use that coin in making change, nor not. Either way results in a simpler problem (either there's less change to make, or fewer coins with which to make it), so the recursive definition is well-founded. If you're out of coins, you can't make change for a positive amount.

And this works well enough for modest amounts, e.g., returning to the original problem,

> countChange [50,25,10,5,1] 100 292

But a problem with this code is that it scales badly: the amount of computation is $O(n^k)$, where $n$ is the amount of money to be changed, and $k$ is the number of coins. In the case of US coinage, this means and $O(n^5)$ algorithm if dollar coins are not allowed, and that's already bad. The $n=100$ case is tractable enough, but $n=1000$ takes a long time. Can we do better?

The core problem here is that this is a Fibonacci-like computation, e.g., the call tree contains a great deal of duplication. It's easy to see, though, that only $n \times k$ distinct calls can be made (at least for positive $n$), and $n \times k$ is a lot smaller than $n^k$. The obvious solution is memoization -- i.e., the first time we compute countChange for a particular pair of parameters, we'll somehow remember the resulting value, and if we're asked to compute it again, we return that remembered value. We could do this with a State monad and Map, but let's pursue a different path. In most programming languages, we'd use a 2-dimensional array to hold the values, and we'd cleverly chose an order of computation that guaranteed that we didn't try to read a value before writing it. This technique sometimes goes by the name “dynamic programming.”

There are a couple of array packages that are part of the Haskell Platform: Data.Array and Data.Vector. Data.Vector is often faster, but it's less flexible, and in a way that matters for this problem, so we'll use Data.Array. There are two possible interfaces, one through Data.Array, and another through Data.Array.IArray. The later has a more general interface, and its use is usually recommended. The "I" in IArray stands for “immutable.” This is not a limitation in the present case.

The function

array :: (IArray a e, Ix i) => (i, i) -> [(i, e)] -> a i e

constructs an array from a list of index-value pairs. The Ix (index) type class constrains the type of keys. Not surprisingly, Int is a member of Ix, but in general, if t1, t2, ..., tk are instances of Ix, then (t1,t2,...,tk) is also an instance of Ix via a deriving instance definition, which is how we get to multi-dimensional arrays.

This allows for the following implementation of countChange:

import Data.Array.IArray countChange :: [Int] -> Int -> Integer countChange denominations amount = table ! (nDenominations, amount) where nDenominations = length denominations table = array ((1,1),(nDenominations,amount)) [ ((ix,n), cc ix (n-v) + cc (ix-1) n) | (ix,v) <- zip [1..] (reverse denominations) , n <- [1..amount] ] :: Array (Int,Int) Integer cc ix n | n == 0 = 1 | n < 0 = 0 | ix < 1 = 0 | otherwise = table ! (ix,n) usCC = countChange [100,50,25,10,5,1]

There's a lot going on here, so let's work through it. We're building a 2D array (named table) such that the value stored at index (ix,n) is the number of ways to make change for n given the last ix many coins. We'll be able to compute much larger return values, hence the change from Int to Integer, e.g.,

> usCC 100000 13398445413854501

which would be unimaginable with the original function definition.

The local function cc is essentially a bound-checked array lookup. This is recognizably the same algorithm, but the introduction of the array adds complexity to the code.

One of the things that's surprising about this particular program is that we're using a “boxed” array here, which means that the array construction isn't strict in values, so it's not important that the elements of the array be given in any particular order. All that matters is that the evaluation graph we're building is well-founded, and this can sometimes be useful, but we won't go beyond mentioning that now. Instead, let's return to the issue of immutability.

One surprise in Data.Array.IArray is the lack of a function that updates the array at a single value. That's because, if we're working in a pure model, we can't change existing values, we can only create new ones. Thus, modifying an array isn't really a thing, but creating a new modified array is. As the time to create an array is (at least) proportional in its size, this makes “updating” an array at a single point impractical, it's $O(n)$ rather than $O(1)$. Bulk updates at least allow some amortization of the cost.

Mutable Arrays

Let's now consider a fairly simple problem, but one that comes up a lot in practice. Let's suppose we have an [Int] drawn from a reasonable compact range. Let's suppose we want to create a [(Int,Int)] consisting of key-count pairs. How can we do this? In a typical programming language, we'd have an array counts indexed from min to max, and use count[ix] to hold the number of times we've seen ix so far. We'd then process the list on an element-by-element basis, incrementing the corresponding value of the counts array. This is intrinsically impure.

We ordinarily think of Haskell as a pure programming language, but we've seen some impurity already in IO. Not surprisingly, Haskell has a way of dealing with impure programming, but it's via the ST monad. You had to know that there were monads involved somehow.

Here's our code:

import Control.Monad.ST import Data.Traversable import Data.Array.ST count :: Int -> Int -> [Int] -> [(Int,Int)] count min max items = runST $ do counts <- newArray (min,max) 0 :: ST s (STArray s Int Int) for items $ \t -> do old <- readArray counts t writeArray counts t (old+1) getAssocs counts

Let's walk through it. We have to include imports for the various modules we're using here. Nothing new about that. The count function has a very ordinary type, but the calculation will be done in the ST monad, and we'll extract the final results via the runST function.

The counts array is constructed using the newArray function (much as before), but with a somewhat mysterious type. We'll come back to that, but the important thing to note here is that the s is shared. We'll then traverse the items list, processing each item via an array read and write. These are both functions that take values in ST. Finally, the getAssocs function returns the resulting key-count pairs in ST. This is all remarkably familiar if you've worked in a ordinary impure language like Python, if perhaps a bit verbose.

But there is something going on here that's very interesting. Even though we used mutability to define count, there's no evidence of that in its type. Instead, the use of mutability was entirely encapsulated within the definition of the function. We're doing pure things by impure means, because it is more efficient to do so. And having done so, we can now use count in pure code, and still enjoy all of the (considerable) advantages purity give us, including referential transparency.

I'll note the existence of a very nice function, accumArray, which is part of Data.Array.IArray:

accumArray :: (IArray a e, Ix i) => (e -> e' -> e) -> e -> (i,i) -> [(i,e')] -> a i e

This is a kind of a mashup of (immutable) array creation with a foldr happening at each index. We can use accumArray to give a simpler definition of count:

import Data.Array.IArray count :: Int -> Int -> [Int] -> [(Int,Int)] count min max items = assocs countArray where pairs = map (\ix -> (ix,1)) items countArray = accumArray (+) 0 (min,max) pairs :: Array Int Int

This could be done in a simpler way, were it not for the need to clue-in the type checker about the underlying types in the array.

Let's return to one final item: the shared s in the type of count in the explicit solution above. Haskell wants to ensure that if you use mutability in the context of a pure type, that no impurity can leak out. Such leakage would be possible if you could take an STArray defined within one ST monad, export its value, and then subsequently modify it in another ST monad. This can't happen, because that shared parameter is used to guarantee an STArray can only be used within the monadic value in which it was defined. There's some mind-bending complexity here (we're touching on Existential Types), which doesn't need to be understood to be used. Mostly this just gets out of your way, but it will stop you from doing things that would break Haskell's world.

Perhaps it goes without saying that you can create simple mutable variables (not just mutable arrays) within the ST monad. There's not much surprising about this, only that we got through 26 lectures without real mutability, and probably didn't even miss it much after the first few lectures. But we have it, and we have it under strict type control. It is occasionally useful, if far less necessary than imperative programmers tend to believe.