Lecture 18

The State Monad, II

HTML

Our next example is a good deal more practical. We're going to use the State monad to write a small library for producing HTML. First things first: there's no need for us to write our own State monad—there is one defined for us in Control.Monad.State. Note that the State monad defined in Control.Monad.State has a more complicated definition than ours, and this underlying complexity may be exposed through inferred types and/or error messages. Don't be alarmed, we'll deal with the additional complexity in due course.

import Control.Monad.State type HTML = State String

The idea here is that our state is going to be a String that contains HTML, and our various operations are going to act on it. In the simplest case, we'll simply append some text onto the state:

string :: String -> HTML () string t = modify (++t)

Which, after the usual transformations, we can write as

string = modify . flip (++)

Next, we need a bit of code (analogous to perform in the calculator example) to render an HTML monadic object as a string:

render :: HTML a -> String render mHTML = execState mHTML ""

Note that this code is already borderline useful:

> render (string "foo" >> string "bar") "foobar"

Thus, we can use monads to do simple string concatenation, with an alternative notion.

But the heart of HTML is its use of tags. We'll define tag to be a monadic function, which takes a tag name and a monadic argument, and writes a start tag, then performs the action of the argument monad (appending its output to the state), and then concludes by writing the end tag.

tag :: String -> HTML a -> HTML () tag t mHTML = do string $ "<" ++ t ++ ">" mHTML string $ "</" ++ t ++ ">"

And if we want attributes:

attrTag :: String -> [(String,String)] -> HTML a -> HTML () attrTag t attrs mHTML = do let format (k,v) = k ++ "=" ++ show v string $ "<" ++ (unwords $ t : map format attrs) ++ ">" mHTML string $ "</" ++ t ++ ">"

We can then define a number of tagging functions:

html = tag "html" head = tag "head" title = tag "title" body = tag "body" p = tag "p" i = tag "i" b = tag "b" h1 = tag "h1" h2 = tag "h2" h3 = tag "h3" h4 = tag "h4" ol = tag "ol" ul = tag "ul" table = tag "table" tr = tag "tr" th = tag "th" td = tag "td"

These functions can be used to give a nice structural definition of an HTML page in Haskell, e.g.,

doc = html $ do head $ do title $ string "Hello, world!" body $ do h1 $ string "Greetings" p $ string "Hello, world!"

We can render this, and write it as a file:

> writeFile "hello.html" $ render doc >

This is all pretty enough, but how is it useful?

Let's consider a fairly typical minor problem in dealing with a web server—determining exactly what environmental variables are set. What we're going to do is write a little CGI (common gateway interface) program, which obtains the environmental bindings, and converts them to HTML. There is surprisingly little code required:

module Main where import Prelude hiding (head) import Data.List (sort) import System.Posix.Env import HTML doc :: [(String,String)] -> HTML () doc env = html $ do head $ title $ string "Environment variables" body $ do h1 $ string "Environment variables:" ul $ mapM_ (\(a,b) -> li $ string $ a ++ "=" ++ b) $ sort env main :: IO () main = do env <- getEnvironment putStr "Content-type: text/html\n\n" putStr $ render $ doc env

The heavy lifting here is done by the call to mapM_, which turns a list of binding pairs into an HTML () object that sequences an appropriately formatted li element for each binding pair. Now, most people don't write CGI programs in Haskell, but I sometimes do, generally speaking using the functionality found in Text.Blaze and Network.cgi, but this small example shows how we can roll our own functionality.

*Exercise 18.1

When generating output in formats such as HTML, it is often desirable to pretty print the results, that is, making judicious use of indentation, newlines, and other formatting choices to make the file more readable. In this problem, you will write a pretty-printing version of the HTML generator above. For example:

> :load PrettyHTML.hs > putStr $ render doc' <html> <head> <title> Hello, world! </title> </head> <body> <h1> Greetings </h1> <p> Hello, world! </p> </body> </html>

Note here that we call putStr to force the newline characters (e.g. '\n') in the string to be rendered as line breaks in the output displayed to stdout.

To implement this functionality, start with the new state representation

type HTML = State (Int, String)

where the integer value represents the "depth" of the current tag in the HTML tree. When generating string output at depth k, the string should be indented k "tabs" to the right. One tab should be your favorite small number of spaces, such as " " or " ".

The file PrettyHTML.hs provides a template for your solution, where undefined is used as a placeholder for the following definitions that you must complete:

render :: HTML a -> String adjustDepth :: (Int -> Int) -> HTML () tab :: HTML () newline :: HTML () indentedString :: String -> HTML () tag :: String -> HTML a -> HTML ()

The tab and newline functions should append tabs and newlines, respectively, to the current string state. The function adjustDepth is used to update the current depth level, useful when processing tags. The indentedString function should render its string argument on a new line indented according to the current depth of the tree. You may define helper functions if you wish to.

Randomness

There is an important class of computer programs that use randomness (or more properly, as we'll see, pseudo-randomness), often to generate a "typical instance" based on a probabilistic model of some type. Our next program will do just that, through a re-implementation of Emacs's ludic “disassociated-press” command.

The idea behind disassociated-press is simple: The input is used to create a model of English prose, based on the frequency with which one word follows another, and then a random instance of that model is created. The result is best described as “English-like,” often non-sensical, but sometimes disconcertingly sensical. It will be noted in passing that we had fewer sources of entertainment back in the day.

Our model is as follows:

import Control.Monad.State import Data.Map (Map,(!)) import qualified Data.Map as Map type Model = (String,Map String [Maybe String])

Our model keeps track of the first word (which is used to kick-off generation), and a map which associates with each word of the text a list of following words. Now, this later is not exactly right, as we're going to use the map to deal with both word succession and termination -- so the values are [Maybe String], where a Just w element represents a succeeding word w, and Nothing represents the end of the text.

Building the model is something we do in pure code.

mkModel :: [String] -> Model mkModel xs@(x:_) = (x,mkMap xs) where mkMap = foldr combine Map.empty . mkPairs combine (k,v) = Map.alter insertf k where insertf Nothing = Just [v] insertf (Just vs) = Just (v:vs) mkPairs (x:ys@(y:_)) = (x,Just y) : mkPairs ys mkPairs [x] = [(x,Nothing)]

There are a few things to notice here. The first is recall @ variables, which are used to match a sub-pattern of a larger pattern, e.g., the pattern xs@(x:_) will match [1,2,3], with xs bound to the entire [1,2,3], and x bound to 1. The second is our use of the function

Map.alter :: Ord k => (Maybe a -> Maybe a) -> k -> Map k a -> Map k a

Which is used to update an existing map given a key and an entry updating function. The updating function for Map.alter uses Nothing, both as an argument and as a result, to indicate an unbound key. Thus, Map.alter can insert or remove keys as necessary, as well as adjusting individual entries. One final nuance is that we'll do our word breaking using the Prelude.words function, which will include punctuation. (Remember that so far as words is concerned, words are sequences of characters separated by spaces, not contiguous runs of letters.)

Note that it is possible to write combine in a much simpler way, if we remember that Maybe is not only a Monad, it's also a MonadPlus:

combine (k,v) = Map.alter (\vs -> fmap (v:) vs `mplus` return [v]) k

This is a trick worth remembering, because we often find ourselves writing a function to pass to Map.adjust that can be thought of as updating an existing entry if it exists (which is what the fmap does), or providing a default entry otherwise.

Randomness enters into the program in generating an example text from the model. The central problem for us is to select a random element from a list, and herein enters the central problem of writing pure functional code that uses randomness. Most programming languages provide a function

rangen :: () -> Int

The idea here is that each call to rangen () will produce a new, random result. But pure languages don't work that way: functions alway produce equal results on equal arguments. Haskell deals with this by defining

class RandomGen g where next :: g -> (Int, g) ...

The idea here is that a random number generator will produce both a random integer, and a new random number generator. Code that uses randomness then chains these random number generators through the various calls, and this can be a pain to keep straight. So we use the State monad to "hide" the random number generators.

import System.Random import System.IO type RandState = State StdGen

We can now write

select :: [a] -> RandState a select as = do ix <- state $ randomR (0,length as - 1) return $ as !! ix

Which we can express more succinctly as

select as = fmap (as !!) . state . randomR $ (0,length as - 1)

The function randomR (a,b) will produce a random element in the range from a to b inclusive, which we'll use as an index into the list. Note that randomR:: RandomGen g => (a, a) -> g -> (a, g), so we're going to use the state function to lift a pure function of type RandomGen g => g -> (a,g) into the RandState monad.

This brings us to the actual generation of the list of words from the model. This starts with the first word, and we use each successive word to look up possible continuations.

runModel :: Model -> RandState [String] runModel (start,wordmap) = iter start where iter word = do let successors = wordmap ! word succ <- select successors case succ of Nothing -> return [word] Just w -> do ws <- iter w return (word:ws)

Exercise 18.2 Show how the code for runModel can be tightened up to the following:

runModel :: Model -> RandState [String] runModel (start,wordmap) = iter start where iter word = do succ <- select $ wordmap ! word case succ of Nothing -> return [word] Just w -> fmap (word:) $ iter w

Of course, a list of words doesn't lend itself to nice output, so we'll write a little line-breaking function:

linefill :: Int -> [String] -> String linefill _ [] = "\n" linefill n (x:xs) = iter x xs where iter x (y:ys) | length x + length y + 1 > n = x ++ "\n" ++ linefill n (y:ys) | otherwise = iter (x ++ " " ++ y) ys iter x [] = x ++ "\n"

This leaves us with main:

main :: IO () main = do input <- getContents let model = mkModel (words input) gen <- getStdGen let ws = evalState (runModel model) gen putStr $ linefill 72 ws

All that remains is a good chunk of prose to test this on. We'll consider the Gettysburg address, and produce the following Gettysburg address-like word salad:

$ ./disassociated-press < gettysburg.txt Four score and proper that nation might live. It is for us to the last full measure of that we can not dedicate, we can never forget what we can not consecrate, we can never forget what we can never forget what we can long endure. We are created equal. Now we can not hallow this ground. The world will little note, nor long remember what they who here gave the unfinished work which they gave their lives that government of that these honored dead we here to the unfinished work which they who fought here gave the last full measure of that field, as a portion of that war. We have thus far so nobly advanced. It is altogether fitting and dead, who here have a final resting place for which they did here. It is rather for the people, by the proposition that government of that government of freedom—and that field, as a new birth of that we can long remember what we can not perish from these honored dead we can not hallow this continent a great task remaining before us—that from these honored dead shall not have died in vain—that this continent a final resting place for which they did here. It is for which they gave the earth.

*Exercise 18.3 A problem with simple probabilistic text generators like the one above is that they can generate very large amounts of text. How great is the danger in this case? Rework the program to run the model 1,000 times (without printing!), and compute the largest and smallest string printed. (To be clear here, we're measuring length in characters, not words.) Hint: replicateM is really useful at running a monad a bunch of times.

Use the module disassociated-press.hs, and gettysburg.txt.