Lecture 16: Concrete Monads: State, II

HTML

Our next example is a good deal more practical. We're going to use the State monad to write a small library for producing HTML. Note that we're moving to the “official” MTL implementation:


    import Control.Monad.State

    type Document = State String

The idea here is that our state is going to be a String that contains HTML, and our various operations are going to act on it. In the simplest case, we'll simply append some a String onto the state:


    string :: String -> Document ()
    string t = modify (\s -> s ++ t)

Which, after the usual transformations, we can write as


    string = modify . flip (++)

Next, we need a bit of code (analogous to perform in the calculator example) to render an HTML value as a string:


    render :: Document a -> String
    render doc = execState doc ""

Note that this code is already borderline useful:


    > render (string "foo" >> string "bar")
    "foobar"

Thus, we can do simple string concatenation, with an alternative notion.

But the heart of HTML is its use of tags. We'll define tag to be a monadic function, which takes a tag name and a monadic argument, and writes a start tag, then performs the action of the argument monad (appending its output to the state), and then concludes by writing the end tag.


    type HTML = Document ()

    tag :: String -> HTML -> HTML
    tag t html = do
        string $ "<" ++ t ++ ">"
        html
        string $ "</"++ t ++ ">"

We can then define a number of tagging functions:


    html  = tag "html"
    head  = tag "head"
    title = tag "title"
    body  = tag "body"
    p     = tag "p"
    i     = tag "i"
    b     = tag "b"
    h1    = tag "h1"
    h2    = tag "h2"
    h3    = tag "h3"
    h4    = tag "h4"
    ol    = tag "ol"
    ul    = tag "ul"
    table = tag "table"
    tr    = tag "tr"
    th    = tag "th"
    td    = tag "td"

These functions can be used to give a nice structural definition of an HTML page in Haskell, e.g.,


    doc :: HTML
    doc =
    	html $ do
            head $ do
                title $ string "Hello, world!"
            body $ do
                h1 $ string "Greetings"
                p $ string "Hello, world!"

We can render this, and write it as a file:


    > writeFile "hello.html" $ render doc
    >

This is all pretty enough, but how is it useful?

Let's consider a fairly typical minor problem in dealing with a web server: determining exactly what environmental variables are set. What we're going to do is write a little CGI (common gateway interface) program, which obtains the environmental bindings, and converts them to HTML. There is surprisingly little code required:


    {- A program for rendering a CGI's environment as HTML -}
    
    module Main where
    
    import HTML
    import Data.List (sort)
    import System.Posix.Env
    
    {- Create an HTML document based on a key/value list -}
    
    makeDoc :: [(String,String)] -> HTML
    makeDoc env =
        html $ do
            HTML.head . title . string $ "Environment variables"
            body $ do
                h1 . string $ "Environment variables:"
                ul . mapM_ makeEntry . sort $ env
        where

        makeEntry (key,value) = li . string $ key ++ " = " ++ encode value
    
        encode = concatMap encodeChar where
            encodeChar '<' = "&lt;"
            encodeChar '&' = "&amp;"
            encodeChar '>' = "&gt;"
            encodeChar c = [c]
    
    {- The main act -}
    
    main :: IO ()
    main = do
        env <- getEnvironment
        putStr "Content-type: text/html\n\n"  {- the minimal required HTTP header -}
        putStr . render . makeDoc $ env

The heavy lifting here is done by the call to mapM_, which turns a list of binding pairs into an HTML value that sequences an appropriately formatted li element for each binding pair.

Most people don't write CGI programs in Haskell, I'm not most people, so I sometimes do, albeit usually using the functionality found in Text.Blaze and Network.cgi, but this small example shows how we can roll our own functionality.

*Exercise 16.1

When generating output in formats such as HTML, it is often desirable to pretty print the results, especially while debugging. The idea is to make judicious use of indentation, newlines, and other formatting choices to make the file more readable. In this problem, you will write a pretty-printing version of the HTML generator above. For example:


    > :load PrettyHTML.hs
    > putStr $ render doc
    
    <html>
      <head>
        <title>Hello, world!</title>
      </head>
      <body>
        <h1>Greetings</h1>
        <p>Hello, world!</p>
      </body>
    </html>

To implement this functionality, start with the new state representation


    type HTML = State (Int, String)

where the integer value represents the "depth" of the current tag in the HTML tree. When generating string output at depth k, the string should typically be indented k "tabs" to the right. One tab should be your favorite small number of spaces, such as " " or " ".

The file PrettyHTML.hs provides a template for your solution, where undefined is used as a placeholder for the following definitions that you must complete:


    string :: String -> Document ()
    newline :: Document ()
    render :: Document a -> String
    indent :: Document ()
    exdent :: Document ()

The newline function should append an indentation-aware newline (i.e., a newline followed by an appropriate number of tabs) to the current string state. The indent and exdent functions are used to increment (respectively decrement) the indentation level by one.

Randomness

There is an important class of computer programs that use randomness (or more properly, as we'll see, pseudo-randomness), often to generate a "typical instance" based on a probabilistic model of some type. Our next program will do just that, through a re-implementation of Emacs's ludic “disassociated-press” command.

The idea behind disassociated-press is simple: The input is used to create a model of English prose, based on the frequency with which one word follows another, and then a random instance of that model is created. The result is best described as “English-like,” often non-sensical, but sometimes disconcertingly sensical. It will be noted in passing that we had fewer sources of entertainment back in the day.

Our model is as follows:


    type Model = (String,Map String [Maybe String])

A Map is simply a higher efficiency version of an association list.

Our model keeps track of the first word (which is used to kick-off generation), and a map which associates with each word of the text a list of following words. Now, this later is not exactly right, as we're going to use the map to deal with both word succession and termination -- so the values are [Maybe String], where a Just w element represents a succeeding word w, and Nothing represents the end of the text.

Building the model is something we do in pure code.


    buildModel :: [String] -> Model
    buildModel xs@(x:_) = (x,unionsWith (++) . transitions $ xs) where
        transitions (y:ys@(y':_)) = singleton y [Just y'] : transitions ys
        transitions [y] = [singleton y [Nothing]]
        transitions [] = error "Impossible error"
    buildModel [] = error "Empty model"

The Map data type has a lot of existing functionality, including functionality for mutation, but it is generally more convenient to build maps out of simpler maps, as we've done here, providing an appropriate combining function.

Randomness enters into the program in generating an example text from the model. The central problem for us is to select a random element from a list, and herein enters the central problem of writing pure functional code that uses randomness. Most programming languages provide a function


    rangen :: () -> Int

The idea here is that each call to rangen () will produce a new, random result. But pure languages don't work that way: functions alway produce equal results on equal arguments. Haskell deals with this by defining


    class RandomGen g where
       next     :: g -> (Int, g)
       ...

which should look like a familiar state transition function, because that's what it is.

The idea here is that a random number generator will produce both a random integer, and a new random number generator. Code that uses randomness then chains these random number generators through the various calls, and this can be a pain to keep straight. So we use the State monad to "hide" the random number generators.


    import System.Random
    
    type RandState = State StdGen

We can now write


    roll :: Int -> RandState Int
    roll n = state $ randomR (1,n)

Which rolls an n sided dice, and


    select :: [a] -> RandState a
    select as = do
        i <-  roll . length $ as
        pure $ as !! (i-1)

Which we can express more succinctly as


    select as = (as !!) . (subtract 1) <$> roll (length as)

The function randomR (a,b) will produce a random element in the range from a to b inclusive, which we'll use as an index into the list. Note that randomR:: RandomGen g => (a, a) -> g -> (a, g), so we're going to use the state function to lift a pure function of type RandomGen g => g -> (a,g) into RandState as before.

This brings us to the actual generation of the list of words from the model. This starts with the first word, and we use each successive word to look up possible continuations.


    runModel :: Model -> RandState [String]
    runModel (start,wordmap) = iter start where
        iter word = do
            let successors = wordmap ! word
            succ <- select successors
            case succ of
                Just w -> do
                    ws <- iter w
                    pure (word:ws)
                Nothing -> pure [word]

Exercise 16.2 Show how the code for runModel can be tightened up to the following:


    runModel :: Model -> RandState [String]
    runModel (start,wordmap) = iter start where
        iter word = (word:) <$> do
            maybeNext <- select $ wordmap ! word
            case maybeNext of
                Just nextWord -> iter nextWord
                Nothing -> pure []

Of course, a list of words doesn't lend itself to nice output, so we'll write a little line-breaking function:


    linefill :: Int -> [String] -> String
    linefill _ [] = "\n"
    linefill n (x:xs) = iter x xs where
        iter current (nextWord:ys)
            | length current + length nextWord + 1 > n = current ++ "\n" ++ linefill n (nextWord:ys)
            | otherwise                   = iter (current ++ " " ++ nextWord) ys
        iter current [] = current ++ "\n"

This leaves us with main:


    main :: IO ()
    main = do
        input <- getContents
        gen <- getStdGen
        let model = buildModel (words input)
            disassociatedPress = evalState (runModel model) gen
        putStr . linefill 72 $ disassociatedPress

All that remains is a good chunk of prose to test this on. We'll consider the Gettysburg address, and produce the following Gettysburg address-like word salad:


    $ ./disassociated-press < gettysburg.txt
    Four score and proper that nation might live. It is for us to the last
    full measure of that we can not dedicate, we can never forget what we
    can not consecrate, we can never forget what we can never forget what we
    can long endure. We are created equal. Now we can not hallow this
    ground. The world will little note, nor long remember what they who here
    gave the unfinished work which they gave their lives that government of
    that these honored dead we here to the unfinished work which they who
    fought here gave the last full measure of that field, as a portion of
    that war. We have thus far so nobly advanced. It is altogether fitting
    and dead, who here have a final resting place for which they did here.
    It is rather for the people, by the proposition that government of that
    government of freedom—and that field, as a new birth of that we can long
    remember what we can not perish from these honored dead we can not
    hallow this continent a great task remaining before us—that from these
    honored dead shall not have died in vain—that this continent a final
    resting place for which they did here. It is for which they gave the
    earth.

*Exercise 16.3 A problem with simple probabilistic text generators like the one above is that they can generate very large amounts of text. How great is the danger in this case? Rework the program to run the model 1,000 times (without printing!), and compute the largest and smallest string printed. (To be clear here, we're measuring length in characters, not words.) Hint: replicateM is really useful at running a monad a bunch of times.

Use the module disassociated-press.hs, and gettysburg.txt.

CMSC-16100

Honors Introduction to Programming, I

Autumn Quarter, 2017

Lecture 16: Concrete Monads: State, II

HTML

Randomness