Lecture 17: The State Monad, II
HTML
Our next example is a good deal more practical. We're going to use the State
monad to write a small library for producing HTML. First things first: there's no need for us to write our own State
monad—there is one defined for us in Control.Monad.State
. Note that the State
monad defined in Control.Monad.State
has a more complicated definition than ours, and this underlying complexity may be exposed through inferred types and/or error messages. Don't be alarmed, we'll deal with the additional complexity in due course.
import Control.Monad.State
type HTML = State String
The idea here is that our state is going to be a String
that contains HTML, and our various operations are going to act on it. In the simplest case, we'll simply append some text onto the state:
string :: String -> HTML ()
string t = modify (++t)
Which, after the usual transformations, we can write as
string = modify . flip (++)
Next, we need a bit of code (analogous to perform
in the calculator example) to render an HTML monadic object as a string:
render :: HTML a -> String
render mHTML = execState mHTML ""
Note that this code is already borderline useful:
> render (string "foo" >> string "bar")
"foobar"
Thus, we can use monads to do simple string concatenation, with an alternative notion.
But the heart of HTML is its use of tags. We'll define tag
to be a monadic function, which takes a tag name and a monadic argument, and writes a start tag, then performs the action of the argument monad (appending its output to the state), and then concludes by writing the end tag.
tag :: String -> HTML a -> HTML ()
tag t mHTML = do
string $ "<" ++ t ++ ">"
mHTML
string $ "</" ++ t ++ ">"
And if we want attributes:
attrTag :: String -> [(String,String)] -> HTML a -> HTML ()
attrTag t attrs mHTML = do
let format (k,v) = k ++ "=" ++ show v
string $ "<" ++ (unwords $ t : map format attrs) ++ ">"
mHTML
string $ "</" ++ t ++ ">"
We can then define a number of tagging functions:
html = tag "html"
head = tag "head"
title = tag "title"
body = tag "body"
p = tag "p"
i = tag "i"
b = tag "b"
h1 = tag "h1"
h2 = tag "h2"
h3 = tag "h3"
h4 = tag "h4"
ol = tag "ol"
ul = tag "ul"
table = tag "table"
tr = tag "tr"
th = tag "th"
td = tag "td"
These functions can be used to give a nice structural definition of an HTML page in Haskell, e.g.,
doc =
html $ do
head $ do
title $ string "Hello, world!"
body $ do
h1 $ string "Greetings"
p $ string "Hello, world!"
We can render this, and write it as a file:
> writeFile "hello.html" $ render doc
>
This is all pretty enough, but how is it useful?
Let's consider a fairly typical minor problem in dealing with a web server—determining exactly what environmental variables are set. What we're going to do is write a little CGI (common gateway interface) program, which obtains the environmental bindings, and converts them to HTML. There is surprisingly little code required:
module Main where
import Prelude hiding (head)
import Data.List (sort)
import System.Posix.Env
import HTML
doc :: [(String,String)] -> HTML ()
doc env = html $ do
head $ title $ string "Environment variables"
body $ do
h1 $ string "Environment variables:"
ul $ mapM_ (\(a,b) -> li $ string $ a ++ "=" ++ b) $ sort env
main :: IO ()
main = do
env <- getEnvironment
putStr "Content-type: text/html\n\n"
putStr $ render $ doc env
The heavy lifting here is done by the call to mapM_
, which turns a list of binding pairs into an HTML ()
object that sequences an appropriately formatted li
element for each binding pair. Now, most people don't write CGI programs in Haskell, but I sometimes do, generally speaking using the functionality found in Text.Blaze
and Network.cgi
, but this small example shows how we can roll our own functionality.
*Exercise 17.1
When generating output in formats such as HTML, it is often desirable to pretty print the results, that is, making judicious use of indentation, newlines, and other formatting choices to make the file more readable. In this problem, you will write a pretty-printing version of the HTML generator above. For example:
> :load PrettyHTML.hs
> putStr $ render doc
<html>
<head>
<title>
Hello, world!
</title>
</head>
<body>
<h1>
Greetings
</h1>
<p>
Hello, world!
</p>
</body>
</html>
Note here that we call putStr
to force the newline characters
(e.g. '\n'
) in the string to be rendered as line breaks in the
output displayed to stdout
.
To implement this functionality, start with the new state representation
type HTML = State (Int, String)
where the integer value represents the "depth" of the current tag
in the HTML tree. When generating string output at depth k
,
the string should be indented k
"tabs" to the right.
One tab should be your favorite small number of spaces,
such as " "
or " "
.
The file PrettyHTML.hs
provides a template for your solution, where undefined
is
used as a placeholder for the following definitions that you must complete:
render :: HTML a -> String
adjustDepth :: (Int -> Int) -> HTML ()
tab :: HTML ()
newline :: HTML ()
indentedString :: String -> HTML ()
tag :: String -> HTML a -> HTML ()
The tab
and newline
functions should append tabs
and newlines, respectively, to the current string state. The function
adjustDepth
is used to update the current depth level, useful
when processing tags. The indentedString
function should render
its string argument on a new line indented according to the current depth of
the tree. You may define helper functions if you wish to.
Randomness
There is an important class of computer programs that use randomness (or more properly, as we'll see, pseudo-randomness), often to generate a "typical instance" based on a probabilistic model of some type. Our next program will do just that, through a re-implementation of Emacs's ludic “disassociated-press” command.
The idea behind disassociated-press is simple: The input is used to create a model of English prose, based on the frequency with which one word follows another, and then a random instance of that model is created. The result is best described as “English-like,” often non-sensical, but sometimes disconcertingly sensical. It will be noted in passing that we had fewer sources of entertainment back in the day.
Our model is as follows:
import Control.Monad.State
import Data.Map (Map,(!))
import qualified Data.Map as Map
type Model = (String,Map String [Maybe String])
Our model keeps track of the first word (which is used to kick-off generation), and a map which associates with each word of the text a list of following words. Now, this later is not exactly right, as we're going to use the map to deal with both word succession and termination -- so the values are [Maybe String]
, where a Just w
element represents a succeeding word w
, and Nothing
represents the end of the text.
Building the model is something we do in pure code.
mkModel :: [String] -> Model
mkModel xs@(x:_) = (x,mkMap xs) where
mkMap = foldr combine Map.empty . mkPairs
combine (k,v) = Map.alter insertf k where
insertf Nothing = Just [v]
insertf (Just vs) = Just (v:vs)
mkPairs (x:ys@(y:_)) = (x,Just y) : mkPairs ys
mkPairs [x] = [(x,Nothing)]
There are a few things to notice here. The first is recall @
variables, which are used to match a sub-pattern of a larger pattern, e.g., the pattern xs@(x:_)
will match [1,2,3]
, with xs
bound to the entire [1,2,3]
, and x
bound to 1
. The second is our use of the function
Map.alter :: Ord k => (Maybe a -> Maybe a) -> k -> Map k a -> Map k a
Which is used to update an existing map given a key and an entry updating function. The updating function for Map.alter
uses Nothing
, both as an argument and as a result, to indicate an unbound key. Thus, Map.alter
can insert or remove keys as necessary, as well as adjusting individual entries. One final nuance is that we'll do our word breaking using the Prelude.words
function, which will include punctuation. (Remember that so far as words
is concerned, words are sequences of characters separated by spaces, not contiguous runs of letters.)
Note that it is possible to write combine
in a much simpler way, if we remember that Maybe
is not only a Monad
, it's also a MonadPlus
:
combine (k,v) = Map.alter (\vs -> fmap (v:) vs `mplus` return [v]) k
This is a trick worth remembering, because we often find ourselves writing a function to pass to Map.adjust
that can be thought of as updating an existing entry if it exists (which is what the fmap
does), or providing a default entry otherwise.
Randomness enters into the program in generating an example text from the model. The central problem for us is to select a random element from a list, and herein enters the central problem of writing pure functional code that uses randomness. Most programming languages provide a function
rangen :: () -> Int
The idea here is that each call to rangen ()
will produce a new, random result. But pure languages don't work that way: functions alway produce equal results on equal arguments. Haskell deals with this by defining
class RandomGen g where
next :: g -> (Int, g)
...
The idea here is that a random number generator will produce both a random integer, and a new random number generator. Code that uses randomness then chains these random number generators through the various calls, and this can be a pain to keep straight. So we use the State
monad to "hide" the random number generators.
import System.Random
import System.IO
type RandState = State StdGen
We can now write
select :: [a] -> RandState a
select as = do
ix <- state $ randomR (0,length as - 1)
return $ as !! ix
Which we can express more succinctly as
select as = fmap (as !!) . state . randomR $ (0,length as - 1)
The function randomR (a,b)
will produce a random element in the range from a
to b
inclusive, which we'll use as an index into the list. Note that randomR:: RandomGen g => (a, a) -> g -> (a, g)
, so we're going to use the state
function to lift a pure function of type RandomGen g => g -> (a,g)
into the RandState
monad.
This brings us to the actual generation of the list of words from the model. This starts with the first word, and we use each successive word to look up possible continuations.
runModel :: Model -> RandState [String]
runModel (start,wordmap) = iter start where
iter word = do
let successors = wordmap ! word
succ <- select successors
case succ of
Nothing -> return [word]
Just w -> do
ws <- iter w
return (word:ws)
Exercise 17.2 Show how the code for runModel
can be tightened up to the following:
runModel :: Model -> RandState [String]
runModel (start,wordmap) = iter start where
iter word = do
succ <- select $ wordmap ! word
case succ of
Nothing -> return [word]
Just w -> fmap (word:) $ iter w
Of course, a list of words doesn't lend itself to nice output, so we'll write a little line-breaking function:
linefill :: Int -> [String] -> String
linefill _ [] = "\n"
linefill n (x:xs) = iter x xs where
iter x (y:ys)
| length x + length y + 1 > n = x ++ "\n" ++ linefill n (y:ys)
| otherwise = iter (x ++ " " ++ y) ys
iter x [] = x ++ "\n"
This leaves us with main
:
main :: IO ()
main = do
input <- getContents
let model = mkModel (words input)
gen <- getStdGen
let ws = evalState (runModel model) gen
putStr $ linefill 72 ws
All that remains is a good chunk of prose to test this on. We'll consider the Gettysburg address, and produce the following Gettysburg address-like word salad:
$ ./disassociated-press < gettysburg.txt
Four score and proper that nation might live. It is for us to the last
full measure of that we can not dedicate, we can never forget what we
can not consecrate, we can never forget what we can never forget what we
can long endure. We are created equal. Now we can not hallow this
ground. The world will little note, nor long remember what they who here
gave the unfinished work which they gave their lives that government of
that these honored dead we here to the unfinished work which they who
fought here gave the last full measure of that field, as a portion of
that war. We have thus far so nobly advanced. It is altogether fitting
and dead, who here have a final resting place for which they did here.
It is rather for the people, by the proposition that government of that
government of freedom—and that field, as a new birth of that we can long
remember what we can not perish from these honored dead we can not
hallow this continent a great task remaining before us—that from these
honored dead shall not have died in vain—that this continent a final
resting place for which they did here. It is for which they gave the
earth.
*Exercise 17.3 A problem with simple probabilistic text generators like the one above is that they can generate
very large amounts of text. How great is the danger in this case? Rework the program to run the model 1,000 times (without printing!), and compute the largest and smallest string printed. (To be clear here, we're measuring length in characters, not words.) Hint: replicateM
is really useful at running a monad a bunch of times.
Use the module disassociated-press.hs, and gettysburg.txt.