Lecture 27: Arrays and Mutability
Immutable Arrays
Up until now, when we've needed an indexed container class, we've tended to use Data.Map
, whereas most of the world tends to use arrays. Surprisingly, Haskell has arrays too, but it's a complicated story.
Let's start by considering a simple problem, drawn from Abelson & Sussman's wonderful “Structure and Interpretation of Computer Programs.” How many ways can you make change for a dollar, using only half-dollars, quarters, dimes, nickles, and pennies? Note that order isn't significant, only the number of each type of coin, thus, making change using a half-dollar and two quarters is the same as using two quarters and a half doller. In this case, it's useful to approach the problem in greater generality, writing the function
countChange :: [Int] -> Int -> Int
countChange coins amount
| amount == 0 = 1
| amount < 0 = 0
| otherwise = case coins of
[] -> 0
c:cs -> countChange coins (amount-c) + countChange cs amount
Briefly, you can only make change for the amount
of 0 in one way (with no coins!), and you can't make change for a negative amount
. If you have at least one coin, and a positive amount of money to make change for, you can either use that coin in making change, nor not. Either way results in a simpler problem (either there's less change to make, or fewer coins with which to make it), so the recursive definition is well-founded. If you're out of coins, you can't make change for a positive amount
.
And this works well enough for modest amounts, e.g., returning to the original problem,
> countChange [50,25,10,5,1] 100
292
But a problem with this code is that it scales badly: the amount of computation is $O(n^k)$, where $n$ is the amount of money to be changed, and $k$ is the number of coins. In the case of US coinage, this means and $O(n^5)$ algorithm if dollar coins are not allowed, and that's already bad. The $n=100$ case is tractable enough, but $n=1000$ takes a long time. Can we do better?
The core problem here is that this is a Fibonacci-like computation, e.g., the call tree contains a great deal of duplication. It's easy to see, though, that only $n \times k$ distinct calls can be made (at least for positive $n$), and $n \times k$ is a lot smaller than $n^k$. The obvious solution is memoization -- i.e., the first time we compute countChange
for a particular pair of parameters, we'll somehow remember the resulting value, and if we're asked to compute it again, we return that remembered value. We could do this with a State
monad and Map
, but let's pursue a different path. In most programming languages, we'd use a 2-dimensional array to hold the values, and we'd cleverly chose an order of computation that guaranteed that we didn't try to read a value before writing it. This technique sometimes goes by the name “dynamic programming.”
There are a couple of array packages that are part of the Haskell Platform: Data.Array
and Data.Vector
. Data.Vector
is often faster, but it's less flexible, and in a way that matters for this problem, so we'll use Data.Array
. There are two possible interfaces, one through Data.Array
, and another through Data.Array.IArray
. The later has a more general interface, and its use is usually recommended. The "I" in IArray
stands for “immutable.” This is not a limitation in the present case.
The function
array :: (IArray a e, Ix i)
=> (i, i)
-> [(i, e)]
-> a i e
constructs an array from a list of index-value pairs. The Ix
(index) type class constrains the type of keys. Not surprisingly, Int
is a member of Ix
, but in general, if t1
, t2
, ..., tk
are instances of Ix
, then (t1,t2,...,tk)
is also an instance of Ix
via a deriving instance definition, which is how we get to multi-dimensional arrays.
This allows for the following implementation of countChange
:
import Data.Array.IArray
countChange :: [Int] -> Int -> Integer
countChange denominations amount = table ! (nDenominations, amount) where
nDenominations = length denominations
table = array ((1,1),(nDenominations,amount))
[ ((ix,n), cc ix (n-v) + cc (ix-1) n)
| (ix,v) <- zip [1..] (reverse denominations)
, n <- [1..amount]
] :: Array (Int,Int) Integer
cc ix n
| n == 0 = 1
| n < 0 = 0
| ix < 1 = 0
| otherwise = table ! (ix,n)
usCC = countChange [100,50,25,10,5,1]
There's a lot going on here, so let's work through it. We're building a 2D array (named table
) such that the value stored at index (ix,n)
is the number of ways to make change for n
given the last ix
many coins. We'll be able to compute much larger return values, hence the change from Int
to Integer
, e.g.,
> usCC 100000
13398445413854501
which would be unimaginable with the original function definition.
The local function cc
is essentially a bound-checked array lookup. This is recognizably the same algorithm, but the introduction of the array adds complexity to the code.
One of the things that's surprising about this particular program is that we're using a “boxed” array here, which means that the array construction isn't strict in values, so it's not important that the elements of the array be given in any particular order. All that matters is that the evaluation graph we're building is well-founded, and this can sometimes be useful, but we won't go beyond mentioning that now. Instead, let's return to the issue of immutability.
One surprise in Data.Array.IArray
is the lack of a function that updates the array at a single value. That's because, if we're working in a pure model, we can't change existing values, we can only create new ones. Thus, modifying an array isn't really a thing, but creating a new modified array is. As the time to create an array is (at least) proportional in its size, this makes “updating” an array at a single point impractical, it's $O(n)$ rather than $O(1)$. Bulk updates at least allow some amortization of the cost.
Mutable Arrays
Let's now consider a fairly simple problem, but one that comes up a lot in practice. Let's suppose we have an [Int]
drawn from a reasonable compact range. Let's suppose we want to create a [(Int,Int)]
consisting of key-count pairs. How can we do this? In a typical programming language, we'd have an array counts
indexed from min
to max
, and use count[ix]
to hold the number of times we've seen ix
so far. We'd then process the list on an element-by-element basis, incrementing the corresponding value of the counts
array. This is intrinsically impure.
We ordinarily think of Haskell as a pure programming language, but we've seen some impurity already in IO
. Not surprisingly, Haskell has a way of dealing with impure programming, but it's via the ST
monad. You had to know that there were monads involved somehow.
Here's our code:
import Control.Monad.ST
import Data.Traversable
import Data.Array.ST
count :: Int -> Int -> [Int] -> [(Int,Int)]
count min max items = runST $ do
counts <- newArray (min,max) 0 :: ST s (STArray s Int Int)
for items $ \t -> do
old <- readArray counts t
writeArray counts t (old+1)
getAssocs counts
Let's walk through it. We have to include imports for the various modules we're using here. Nothing new about that. The count
function has a very ordinary type, but the calculation will be done in the ST
monad, and we'll extract the final results via the runST
function.
The counts
array is constructed using the newArray
function (much as before), but with a somewhat mysterious type. We'll come back to that, but the important thing to note here is that the s
is shared. We'll then traverse the items
list, processing each item via an array read and write. These are both functions that take values in ST
. Finally, the getAssocs
function returns the resulting key-count pairs in ST
. This is all remarkably familiar if you've worked in a ordinary impure language like Python, if perhaps a bit verbose.
But there is something going on here that's very interesting. Even though we used mutability to define count
, there's no evidence of that in its type. Instead, the use of mutability was entirely encapsulated within the definition of the function. We're doing pure things by impure means, because it is more efficient to do so. And having done so, we can now use count
in pure code, and still enjoy all of the (considerable) advantages purity give us, including referential transparency.
I'll note the existence of a very nice function, accumArray
, which is part of Data.Array.IArray
:
accumArray :: (IArray a e, Ix i)
=> (e -> e' -> e)
-> e
-> (i,i)
-> [(i,e')]
-> a i e
This is a kind of a mashup of (immutable) array creation with a foldr
happening at each index. We can use accumArray
to give a simpler definition of count
:
import Data.Array.IArray
count :: Int -> Int -> [Int] -> [(Int,Int)]
count min max items = assocs countArray where
pairs = map (\ix -> (ix,1)) items
countArray = accumArray (+) 0 (min,max) pairs :: Array Int Int
This could be done in a simpler way, were it not for the need to clue-in the type checker about the underlying types in the array.
Let's return to one final item: the shared s
in the type of count
in the explicit solution above. Haskell wants to ensure that if you use mutability in the context of a pure type, that no impurity can leak out. Such leakage would be possible if you could take an STArray
defined within one ST
monad, export its value, and then subsequently modify it in another ST
monad. This can't happen, because that shared parameter is used to guarantee an STArray
can only be used within the monadic value in which it was defined. There's some mind-bending complexity here (we're touching on Existential Types), which doesn't need to be understood to be used. Mostly this just gets out of your way, but it will stop you from doing things that would break Haskell's world.
Perhaps it goes without saying that you can create simple mutable variables (not just mutable arrays) within the ST
monad. There's not much surprising about this, only that we got through 26 lectures without real mutability, and probably didn't even miss it much after the first few lectures. But we have it, and we have it under strict type control. It is occasionally useful, if far less necessary than imperative programmers tend to believe.