Lecture 16: Concrete Monads: State, II

HTML

Our next example is a good deal more practical. We're going to use the State monad to write a small library for producing HTML. Note that we're moving to the “official” MTL implementation:

import Control.Monad.State type Document = State String

The idea here is that our state is going to be a String that contains HTML, and our various operations are going to act on it. In the simplest case, we'll simply append some a String onto the state:

string :: String -> Document () string t = modify (\s -> s ++ t)

Which, after the usual transformations, we can write as

string = modify . flip (++)

Next, we need a bit of code (analogous to perform in the calculator example) to render an HTML value as a string:

render :: Document a -> String render doc = execState doc ""

Note that this code is already borderline useful:

> render (string "foo" >> string "bar") "foobar"

Thus, we can do simple string concatenation, with an alternative notion.

But the heart of HTML is its use of tags. We'll define tag to be a monadic function, which takes a tag name and a monadic argument, and writes a start tag, then performs the action of the argument monad (appending its output to the state), and then concludes by writing the end tag.

type HTML = Document () tag :: String -> HTML -> HTML tag t html = do string $ "<" ++ t ++ ">" html string $ "</"++ t ++ ">"

We can then define a number of tagging functions:

html = tag "html" head = tag "head" title = tag "title" body = tag "body" p = tag "p" i = tag "i" b = tag "b" h1 = tag "h1" h2 = tag "h2" h3 = tag "h3" h4 = tag "h4" ol = tag "ol" ul = tag "ul" table = tag "table" tr = tag "tr" th = tag "th" td = tag "td"

These functions can be used to give a nice structural definition of an HTML page in Haskell, e.g.,

doc :: HTML doc = html $ do head $ do title $ string "Hello, world!" body $ do h1 $ string "Greetings" p $ string "Hello, world!"

We can render this, and write it as a file:

> writeFile "hello.html" $ render doc >

This is all pretty enough, but how is it useful?

Let's consider a fairly typical minor problem in dealing with a web server: determining exactly what environmental variables are set. What we're going to do is write a little CGI (common gateway interface) program, which obtains the environmental bindings, and converts them to HTML. There is surprisingly little code required:

{- A program for rendering a CGI's environment as HTML -} module Main where import HTML import Data.List (sort) import System.Posix.Env {- Create an HTML document based on a key/value list -} makeDoc :: [(String,String)] -> HTML makeDoc env = html $ do HTML.head . title . string $ "Environment variables" body $ do h1 . string $ "Environment variables:" ul . mapM_ makeEntry . sort $ env where makeEntry (key,value) = li . string $ key ++ " = " ++ encode value encode = concatMap encodeChar where encodeChar '<' = "&lt;" encodeChar '&' = "&amp;" encodeChar '>' = "&gt;" encodeChar c = [c] {- The main act -} main :: IO () main = do env <- getEnvironment putStr "Content-type: text/html\n\n" {- the minimal required HTTP header -} putStr . render . makeDoc $ env

The heavy lifting here is done by the call to mapM_, which turns a list of binding pairs into an HTML value that sequences an appropriately formatted li element for each binding pair.

Most people don't write CGI programs in Haskell, I'm not most people, so I sometimes do, albeit usually using the functionality found in Text.Blaze and Network.cgi, but this small example shows how we can roll our own functionality.

*Exercise 16.1

When generating output in formats such as HTML, it is often desirable to pretty print the results, especially while debugging. The idea is to make judicious use of indentation, newlines, and other formatting choices to make the file more readable. In this problem, you will write a pretty-printing version of the HTML generator above. For example:

> :load PrettyHTML.hs > putStr $ render doc <html> <head> <title>Hello, world!</title> </head> <body> <h1>Greetings</h1> <p>Hello, world!</p> </body> </html>

To implement this functionality, start with the new state representation

type HTML = State (Int, String)

where the integer value represents the "depth" of the current tag in the HTML tree. When generating string output at depth k, the string should typically be indented k "tabs" to the right. One tab should be your favorite small number of spaces, such as " " or " ".

The file PrettyHTML.hs provides a template for your solution, where undefined is used as a placeholder for the following definitions that you must complete:

string :: String -> Document () newline :: Document () render :: Document a -> String indent :: Document () exdent :: Document ()

The newline function should append an indentation-aware newline (i.e., a newline followed by an appropriate number of tabs) to the current string state. The indent and exdent functions are used to increment (respectively decrement) the indentation level by one.

Randomness

There is an important class of computer programs that use randomness (or more properly, as we'll see, pseudo-randomness), often to generate a "typical instance" based on a probabilistic model of some type. Our next program will do just that, through a re-implementation of Emacs's ludic “disassociated-press” command.

The idea behind disassociated-press is simple: The input is used to create a model of English prose, based on the frequency with which one word follows another, and then a random instance of that model is created. The result is best described as “English-like,” often non-sensical, but sometimes disconcertingly sensical. It will be noted in passing that we had fewer sources of entertainment back in the day.

Our model is as follows:

type Model = (String,Map String [Maybe String])

A Map is simply a higher efficiency version of an association list.

Our model keeps track of the first word (which is used to kick-off generation), and a map which associates with each word of the text a list of following words. Now, this later is not exactly right, as we're going to use the map to deal with both word succession and termination -- so the values are [Maybe String], where a Just w element represents a succeeding word w, and Nothing represents the end of the text.

Building the model is something we do in pure code.

buildModel :: [String] -> Model buildModel xs@(x:_) = (x,unionsWith (++) . transitions $ xs) where transitions (y:ys@(y':_)) = singleton y [Just y'] : transitions ys transitions [y] = [singleton y [Nothing]] transitions [] = error "Impossible error" buildModel [] = error "Empty model"

The Map data type has a lot of existing functionality, including functionality for mutation, but it is generally more convenient to build maps out of simpler maps, as we've done here, providing an appropriate combining function.

Randomness enters into the program in generating an example text from the model. The central problem for us is to select a random element from a list, and herein enters the central problem of writing pure functional code that uses randomness. Most programming languages provide a function

rangen :: () -> Int

The idea here is that each call to rangen () will produce a new, random result. But pure languages don't work that way: functions alway produce equal results on equal arguments. Haskell deals with this by defining

class RandomGen g where next :: g -> (Int, g) ...

which should look like a familiar state transition function, because that's what it is.

The idea here is that a random number generator will produce both a random integer, and a new random number generator. Code that uses randomness then chains these random number generators through the various calls, and this can be a pain to keep straight. So we use the State monad to "hide" the random number generators.

import System.Random type RandState = State StdGen

We can now write

roll :: Int -> RandState Int roll n = state $ randomR (1,n)

Which rolls an n sided dice, and

select :: [a] -> RandState a select as = do i <- roll . length $ as pure $ as !! (i-1)

Which we can express more succinctly as

select as = (as !!) . (subtract 1) <$> roll (length as)

The function randomR (a,b) will produce a random element in the range from a to b inclusive, which we'll use as an index into the list. Note that randomR:: RandomGen g => (a, a) -> g -> (a, g), so we're going to use the state function to lift a pure function of type RandomGen g => g -> (a,g) into RandState as before.

This brings us to the actual generation of the list of words from the model. This starts with the first word, and we use each successive word to look up possible continuations.

runModel :: Model -> RandState [String] runModel (start,wordmap) = iter start where iter word = do let successors = wordmap ! word succ <- select successors case succ of Just w -> do ws <- iter w pure (word:ws) Nothing -> pure [word]

Exercise 16.2 Show how the code for runModel can be tightened up to the following:

runModel :: Model -> RandState [String] runModel (start,wordmap) = iter start where iter word = (word:) <$> do maybeNext <- select $ wordmap ! word case maybeNext of Just nextWord -> iter nextWord Nothing -> pure []

Of course, a list of words doesn't lend itself to nice output, so we'll write a little line-breaking function:

linefill :: Int -> [String] -> String linefill _ [] = "\n" linefill n (x:xs) = iter x xs where iter current (nextWord:ys) | length current + length nextWord + 1 > n = current ++ "\n" ++ linefill n (nextWord:ys) | otherwise = iter (current ++ " " ++ nextWord) ys iter current [] = current ++ "\n"

This leaves us with main:

main :: IO () main = do input <- getContents gen <- getStdGen let model = buildModel (words input) disassociatedPress = evalState (runModel model) gen putStr . linefill 72 $ disassociatedPress

All that remains is a good chunk of prose to test this on. We'll consider the Gettysburg address, and produce the following Gettysburg address-like word salad:

$ ./disassociated-press < gettysburg.txt Four score and proper that nation might live. It is for us to the last full measure of that we can not dedicate, we can never forget what we can not consecrate, we can never forget what we can never forget what we can long endure. We are created equal. Now we can not hallow this ground. The world will little note, nor long remember what they who here gave the unfinished work which they gave their lives that government of that these honored dead we here to the unfinished work which they who fought here gave the last full measure of that field, as a portion of that war. We have thus far so nobly advanced. It is altogether fitting and dead, who here have a final resting place for which they did here. It is rather for the people, by the proposition that government of that government of freedom—and that field, as a new birth of that we can long remember what we can not perish from these honored dead we can not hallow this continent a great task remaining before us—that from these honored dead shall not have died in vain—that this continent a final resting place for which they did here. It is for which they gave the earth.

*Exercise 16.3 A problem with simple probabilistic text generators like the one above is that they can generate very large amounts of text. How great is the danger in this case? Rework the program to run the model 1,000 times (without printing!), and compute the largest and smallest string printed. (To be clear here, we're measuring length in characters, not words.) Hint: replicateM is really useful at running a monad a bunch of times.

Use the module disassociated-press.hs, and gettysburg.txt.