Lecture 16: Concrete Monads: State, II
HTML
Our next example is a good deal more practical. We're going to use the State
monad to write a small library for producing HTML. Note that we're moving to the “official” MTL implementation:
import Control.Monad.State
type Document = State String
The idea here is that our state is going to be a String
that contains HTML, and our various operations are going to act on it. In the simplest case, we'll simply append some a String
onto the state:
string :: String -> Document ()
string t = modify (\s -> s ++ t)
Which, after the usual transformations, we can write as
string = modify . flip (++)
Next, we need a bit of code (analogous to perform
in the calculator example) to render an HTML value as a string:
render :: Document a -> String
render doc = execState doc ""
Note that this code is already borderline useful:
> render (string "foo" >> string "bar")
"foobar"
Thus, we can do simple string concatenation, with an alternative notion.
But the heart of HTML is its use of tags. We'll define tag
to be a monadic function, which takes a tag name and a monadic argument, and writes a start tag, then performs the action of the argument monad (appending its output to the state), and then concludes by writing the end tag.
type HTML = Document ()
tag :: String -> HTML -> HTML
tag t html = do
string $ "<" ++ t ++ ">"
html
string $ "</"++ t ++ ">"
We can then define a number of tagging functions:
html = tag "html"
head = tag "head"
title = tag "title"
body = tag "body"
p = tag "p"
i = tag "i"
b = tag "b"
h1 = tag "h1"
h2 = tag "h2"
h3 = tag "h3"
h4 = tag "h4"
ol = tag "ol"
ul = tag "ul"
table = tag "table"
tr = tag "tr"
th = tag "th"
td = tag "td"
These functions can be used to give a nice structural definition of an HTML page in Haskell, e.g.,
doc :: HTML
doc =
html $ do
head $ do
title $ string "Hello, world!"
body $ do
h1 $ string "Greetings"
p $ string "Hello, world!"
We can render this, and write it as a file:
> writeFile "hello.html" $ render doc
>
This is all pretty enough, but how is it useful?
Let's consider a fairly typical minor problem in dealing with a web server: determining exactly what environmental variables are set. What we're going to do is write a little CGI (common gateway interface) program, which obtains the environmental bindings, and converts them to HTML. There is surprisingly little code required:
{- A program for rendering a CGI's environment as HTML -}
module Main where
import HTML
import Data.List (sort)
import System.Posix.Env
{- Create an HTML document based on a key/value list -}
makeDoc :: [(String,String)] -> HTML
makeDoc env =
html $ do
HTML.head . title . string $ "Environment variables"
body $ do
h1 . string $ "Environment variables:"
ul . mapM_ makeEntry . sort $ env
where
makeEntry (key,value) = li . string $ key ++ " = " ++ encode value
encode = concatMap encodeChar where
encodeChar '<' = "<"
encodeChar '&' = "&"
encodeChar '>' = ">"
encodeChar c = [c]
{- The main act -}
main :: IO ()
main = do
env <- getEnvironment
putStr "Content-type: text/html\n\n" {- the minimal required HTTP header -}
putStr . render . makeDoc $ env
The heavy lifting here is done by the call to mapM_
, which turns a list of binding pairs into an HTML
value that sequences an appropriately formatted li
element for each binding pair.
Most people don't write CGI programs in Haskell, I'm not most people, so I sometimes do, albeit usually using the functionality found in Text.Blaze
and Network.cgi
, but this small example shows how we can roll our own functionality.
*Exercise 16.1
When generating output in formats such as HTML, it is often desirable to pretty print the results, especially while debugging. The idea is to make judicious use of indentation, newlines, and other formatting choices to make the file more readable. In this problem, you will write a pretty-printing version of the HTML generator above. For example:
> :load PrettyHTML.hs
> putStr $ render doc
<html>
<head>
<title>Hello, world!</title>
</head>
<body>
<h1>Greetings</h1>
<p>Hello, world!</p>
</body>
</html>
To implement this functionality, start with the new state representation
type HTML = State (Int, String)
where the integer value represents the "depth" of the current tag
in the HTML tree. When generating string output at depth k
,
the string should typically be indented k
"tabs" to the right.
One tab should be your favorite small number of spaces,
such as " "
or " "
.
The file PrettyHTML.hs
provides a template for your solution, where undefined
is
used as a placeholder for the following definitions that you must complete:
string :: String -> Document ()
newline :: Document ()
render :: Document a -> String
indent :: Document ()
exdent :: Document ()
The newline
function should append an indentation-aware newline (i.e., a newline followed by an appropriate number of tabs) to the current string state. The indent
and exdent
functions are used to increment (respectively decrement) the indentation level by one.
Randomness
There is an important class of computer programs that use randomness (or more properly, as we'll see, pseudo-randomness), often to generate a "typical instance" based on a probabilistic model of some type. Our next program will do just that, through a re-implementation of Emacs's ludic “disassociated-press” command.
The idea behind disassociated-press is simple: The input is used to create a model of English prose, based on the frequency with which one word follows another, and then a random instance of that model is created. The result is best described as “English-like,” often non-sensical, but sometimes disconcertingly sensical. It will be noted in passing that we had fewer sources of entertainment back in the day.
Our model is as follows:
type Model = (String,Map String [Maybe String])
A Map
is simply a higher efficiency version of an association list.
Our model keeps track of the first word (which is used to kick-off generation), and a map which associates with each word of the text a list of following words. Now, this later is not exactly right, as we're going to use the map to deal with both word succession and termination -- so the values are [Maybe String]
, where a Just w
element represents a succeeding word w
, and Nothing
represents the end of the text.
Building the model is something we do in pure code.
buildModel :: [String] -> Model
buildModel xs@(x:_) = (x,unionsWith (++) . transitions $ xs) where
transitions (y:ys@(y':_)) = singleton y [Just y'] : transitions ys
transitions [y] = [singleton y [Nothing]]
transitions [] = error "Impossible error"
buildModel [] = error "Empty model"
The Map
data type has a lot of existing functionality, including functionality for mutation, but it is generally more convenient to build maps out of simpler maps, as we've done here, providing an appropriate combining function.
Randomness enters into the program in generating an example text from the model. The central problem for us is to select a random element from a list, and herein enters the central problem of writing pure functional code that uses randomness. Most programming languages provide a function
rangen :: () -> Int
The idea here is that each call to rangen ()
will produce a new, random result. But pure languages don't work that way: functions alway produce equal results on equal arguments. Haskell deals with this by defining
class RandomGen g where
next :: g -> (Int, g)
...
which should look like a familiar state transition function, because that's what it is.
The idea here is that a random number generator will produce both a random integer, and a new random number generator. Code that uses randomness then chains these random number generators through the various calls, and this can be a pain to keep straight. So we use the State
monad to "hide" the random number generators.
import System.Random
type RandState = State StdGen
We can now write
roll :: Int -> RandState Int
roll n = state $ randomR (1,n)
Which rolls an n
sided dice, and
select :: [a] -> RandState a
select as = do
i <- roll . length $ as
pure $ as !! (i-1)
Which we can express more succinctly as
select as = (as !!) . (subtract 1) <$> roll (length as)
The function randomR (a,b)
will produce a random element in the range from a
to b
inclusive, which we'll use as an index into the list. Note that randomR:: RandomGen g => (a, a) -> g -> (a, g)
, so we're going to use the state
function to lift a pure function of type RandomGen g => g -> (a,g)
into RandState
as before.
This brings us to the actual generation of the list of words from the model. This starts with the first word, and we use each successive word to look up possible continuations.
runModel :: Model -> RandState [String]
runModel (start,wordmap) = iter start where
iter word = do
let successors = wordmap ! word
succ <- select successors
case succ of
Just w -> do
ws <- iter w
pure (word:ws)
Nothing -> pure [word]
Exercise 16.2 Show how the code for runModel
can be tightened up to the following:
runModel :: Model -> RandState [String]
runModel (start,wordmap) = iter start where
iter word = (word:) <$> do
maybeNext <- select $ wordmap ! word
case maybeNext of
Just nextWord -> iter nextWord
Nothing -> pure []
Of course, a list of words doesn't lend itself to nice output, so we'll write a little line-breaking function:
linefill :: Int -> [String] -> String
linefill _ [] = "\n"
linefill n (x:xs) = iter x xs where
iter current (nextWord:ys)
| length current + length nextWord + 1 > n = current ++ "\n" ++ linefill n (nextWord:ys)
| otherwise = iter (current ++ " " ++ nextWord) ys
iter current [] = current ++ "\n"
This leaves us with main
:
main :: IO ()
main = do
input <- getContents
gen <- getStdGen
let model = buildModel (words input)
disassociatedPress = evalState (runModel model) gen
putStr . linefill 72 $ disassociatedPress
All that remains is a good chunk of prose to test this on. We'll consider the Gettysburg address, and produce the following Gettysburg address-like word salad:
$ ./disassociated-press < gettysburg.txt
Four score and proper that nation might live. It is for us to the last
full measure of that we can not dedicate, we can never forget what we
can not consecrate, we can never forget what we can never forget what we
can long endure. We are created equal. Now we can not hallow this
ground. The world will little note, nor long remember what they who here
gave the unfinished work which they gave their lives that government of
that these honored dead we here to the unfinished work which they who
fought here gave the last full measure of that field, as a portion of
that war. We have thus far so nobly advanced. It is altogether fitting
and dead, who here have a final resting place for which they did here.
It is rather for the people, by the proposition that government of that
government of freedom—and that field, as a new birth of that we can long
remember what we can not perish from these honored dead we can not
hallow this continent a great task remaining before us—that from these
honored dead shall not have died in vain—that this continent a final
resting place for which they did here. It is for which they gave the
earth.
*Exercise 16.3 A problem with simple probabilistic text generators like the one above is that they can generate
very large amounts of text. How great is the danger in this case? Rework the program to run the model 1,000 times (without printing!), and compute the largest and smallest string printed. (To be clear here, we're measuring length in characters, not words.) Hint: replicateM
is really useful at running a monad a bunch of times.
Use the module disassociated-press.hs, and gettysburg.txt.