Lecture 12: Concrete Monads: IO
Warmup
Let's start with a simple IO program. This is a Haskell version of a program one of Professor Kurtz's roommates encountered very late at night while working on a CDC 6700 in 1978... . Recall that a value of type IO a
is an IO-action, which, when performed, results in the production of a value of type a
, and this value can be captured by the (>>=)
operator (which appears as <-
in the commonly used, syntactically sugared do
notation).
-- | The annoying "frog" program.
module Main where
import System.IO
-- | Produce a String containing n lines, each of consisting of "frog."
manyFrogs :: Int -> String
manyFrogs i = unlines $ replicate i "frog"
-- | The main act...
main :: IO ()
main = do
putStr "How many frogs? "
hFlush stdout
nstr <- getLine "How many frogs? "
let n = read nstr
putStr $ manyFrogs n
main
You could imagine doing this with the “ninety-nine bottles of beer on the wall” song, but that would have been really annoying!
This is a pretty typical first-pass at a program like this. We've tried to factor out the pure part of the code, cf., manyFrogs
, while almost every line of main
involves an IO action.
Note the use of binding to extract information from getLine :: IO String
, and the use of let
to bind a variable based on a pure computation. One minor surprise is the hFlush stdout
line, which deals with a buffering issue. For efficiency reasons, compiled Haskell code line buffers output, which means that output is send to stdout
when a newline is added to the buffer. This is a problem if we want to read the answer to a question on the same line as the prompt. The solution, naturally enough, is to flush our output buffer. Finally, note the recursive call to main
as the last line of main
. This has the effect of creating an infinite loop. It's a common idiom, but there is a better way.
Of course, this code works correctly the first time we write it, but as Haskell programmers, we're not satisfied until we've worked on the code a bit.
The code fairy insists that we η-reduce manyFrogs
, so we do:
manyFrogs = unlines . (`replicate` "frog")
We quickly realize that the sequence of actions that consist of writing out a prompt string, flushing it, and reading a response, is something that we're likely to want to do again. So we create a new IO action that captures this common interaction:
-- | Prompt for a line of input
prompt :: String -> IO String
prompt msg = do
putStr msg
hFlush stdout
answer <- getLine
pure answer
Of course, we recognize the opportunity to eliminate a binding followed by pure
, resulting in
-- | Prompt for input
prompt :: String -> IO String
prompt msg = do
putStr msg
hFlush stdout
getLine
Optionally, we might decide this is simple enough to desugar into a one-liner:
prompt msg = putStr msg >> hFlush stdout >> getLine
Note that (>>)
doesn't actually use the value from the left-hand argument, and so it more naturally lives in the land of Applicative
, where it is found as (*>)
. So a purist might take this to
prompt msg = putStr msg *> hFlush stdout *> getLine
on the theory that reducing our lexicon by eliminating (>>)
is a good thing. We're purists.
Factoring out prompt
enables us to simplify main
:
main = do
nstr <- prompt "How many frogs? "
let n = read nstr
putStr (manyFrogs n)
main
Next, we realize that we're saving our input String
in the variable nstr
, only to immediately convert it via read
into n
. This makes us hopeful that we can eliminate nstr
altogether, and eliminating ephemeral variables often improves functional code. A first thought is that we can fold this into prompt
, producing a specialized version of prompt that results in binding an Int
. But there's a better way. IO
is not just a Monad
, it's also a Functor
, and we can use (<$>)
to “adjust” the result of an IO-action. Thus,
main = do
n <- read <$> prompt "How many frogs? "
putStr (manyFrogs n)
main
Finally, there's a nice function forever :: Applicative f => f a -> f b
that repeats f a
forever, so we can make this:
main = forever $ do
n <- read <$> prompt "How many frogs? "
putStr (manyFrogs n)
At this point, we could call it done, and probably should, but what the heck... As earlier, we might look at that n
with some skepticism. It is nothing more than an ephemeral data carrier, and so it should be possible to eliminate it. A simple intermediate step is to try to simplify the call to putStr
so that it consists of a single variable. We can do this by moving manyFrogs
up to the preceding line, in effect making it part of the input processing.
main = forever $ do
frogs <- manyFrogs . read <$> prompt "How many frogs? "
putStr frogs
Which can be immediately transformed to
main = forever $
manyFrogs . read <$> prompt "How many frogs? " >>= putStr
This works, but it's hard to read because the ($)
and (.)
want to be read from right-to-left, while the (>>=)
wants to be read from left-to-right. There are several approaches to dealing with this, e.g., we could use (=<<)
, the backward version of (>>=)
but it's more natural to read processing pipelines from left-to-right, and so we'll pursue that approach.
First, let's tackle <$>
. We want an operator version of flip <$>
, and after a bit of searching, we find (<&>)
in Edward Kmett's lens package. We could install lens, and import Control.Lens, but the lens package is huge. Professor Kurtz's wife would describe using lens to get at (<&>)
as “Killing flies with howitzers.” Instead, we'll just steal the definition, which includes exactly the right fixity declaration:
infixl 1 <&>
(<&>) :: Functor f => f a -> (a -> b) -> f b
(<&>) = flip (<$>)
This is just right because $
has fixity infixr 0
, and so operates at a lower precedence than our pipeline operators, and (>>=)
has fixity infixl 1
, the same as <&>
, which means they mix and match naturally. This gets us to
main = forever $
prompt "How many frogs? " <&> manyFrogs . read >>= putStr
Which is better, but there's still that reversal of order via (.)
. This is not a perfect world. Ideally, there'd be a flipped composition that had fixity greater than 1 lying around somewhere in the Haskell libraries, but this does not seem to be the case. Instead, there's a flipped composition (>>>)
in Control.Category
, but it has fixity infixr 1
which is a disaster from our point of view, as it's a syntax error to write a expression like
prompt "How many frogs? " <&> read >>> manyFrogs >>= putStr
which includes operators of the same fixity (1) that associate in different directions (left, for <&>
and (>>=)
; right for (>>>)
). We can solve this with parentheses
prompt "How many frogs? " <&> (read >>> manyFrogs) >>= putStr
or, we could remember our functor laws, and rewrite this as
prompt "How many frogs? " <&> read <&> manyFrogs >>= putStr
which has a nice elegance.
The lens package also defines (&)
as simple backward application, again with fixity infixl 1
, so it plays nicely in processing pipelines built using (>>=)
and <&>
. We can summarize this in a very simple way. If you're building a pipeline around values in a monad m
, and you want to bolt another machine onto the pipeline, select the correct left-to-right compositional operator based on the type of that machine:
Type | Operator |
---|---|
m a -> m b |
(&) |
a -> b |
(<&>) |
a -> m b |
(>>=) |
Understand, you'll probably need to copy the fixity declarations and definitions of (&)
and (<&>)
to do so, but it's a small price.
*Exercise 12.1 Write a Haskell program enumerate
which processes standard input, adding line numbers. E.g., if you have a file numbers.txt
containing:
one
two
three
four
five
six
seven
eight
nine
ten
then $ enumerate < numbers.txt
produces:
1. one
2. two
3. three
4. four
5. five
6. six
7. seven
8. eight
9. nine
10. ten
Hint: look at the lines
and getContents
functions in the Prelude
.
For extra credit, add the minimum number of spaces before each letter so that the decimal points line up, i.e.,
1. one
2. two
3. three
4. four
5. five
6. six
7. seven
8. eight
9. nine
10. ten
Standard IO
We start by considering a simple programming task: reading a file, and converting it to upper case. We call this AOL-ification, in honor of the old pre-internet AOL community, whose internal email system was UPPERCASE ONLY. When the internet was opened up to the public, AOL became an ISP, i.e., an internet service provider. This enabled AOL users to send email to a much larger community, albeit in UPPERCASE ONLY. Annoying, but it does provide us a simple programming task.
The UNIX operating system introduced a simplified mechanism for dealing with IO, the notion of standard IO. This consisted of a predefined standard input, standard output, and standard error output, which were available to any command-line program. These standard inputs and outputs could be associated with files by redirection, and programs that processed standard input, producing standard output, were known as filters, and could be composed at the command-line level by pipes. These ideas have been borrowed by almost all subsequent operating systems.
Haskell provides a number of functions for reading from standard input, e.g., getChar
, getLine
, and getContents
, which read progressively larger chunks of standard input, and putChar
and putStr
for output. We can write a standard IO based AOLify program very simply:
-- | AOLify -- read stdin, capitalize, and write to stdout
module Main where
import Data.Char (toUpper)
main :: IO ()
main = do
input <- getContents
let output = map toUpper input
putStr output
This code seems almost too simple to be worth simplifying, but a few moments reflection suggests that we should be able to eliminate input
and output
, arriving at something like this:
main = getContents <&> map toUpper >>= putStr
which certainly reduces the task to its essentials. But there's an even better way. The task of writing a UNIX-style filter is common, and so the Prelude
defines a function interact :: (String -> String) -> IO ()
that reduces the problem of writing a UNIX-style filter to the problem of defining a (pure) function that maps input strings to output strings. Using this, we can simply write
main = interact $ map toUpper
and be done with it.
Simple IO
Of course, there's more to IO than user interaction. There are also files on disk, network connections, etc. Haskell's Prelude
has a number of functions for dealing with common file interactions, specifically
type FilePath = String
readFile :: FilePath -> IO String
writeFile :: FilePath -> String -> IO ()
appendFile :: FilePath -> String -> IO ()
The use of FilePath
doesn't provide any type safety, but it does help us understand our intentions.
As a practical illustration, we'll do a simple version of UNIX's cat
utility. Our goal is to write a program so that a command like
$ ./cat foo.txt bar.txt
Will result in the contents of foo.txt
and bar.txt
being written on standard output. In this, we will only use readFile
, but the other functions have similar use. Here's a first, naïve, solution:
-- | A simple version of the UNIX "cat" program
module Main where
import System.Environment
-- | Process a list of files, writing the contents of each to standard output.
outputFiles :: [FilePath] -> IO ()
outputFiles [] = pure ()
outputFiles (f:fs) = do
content <- readFile f
putStr content
outputFiles fs
main :: IO ()
main = do
args <- getArgs
outputFiles args
We use getArgs
to extract the argument list, and then the processing of that list is done by the (recursive) outputFiles
function. This works exactly as expected.
We'll play with this a bit, as is our practice. We can eliminate the ephemeral values content
and args
by introducing bindings, and then η-reducing. This gives us:
outputFiles (f:fs) = do
readFile f >>= putStr
outputFiles fs
main = getArgs >>= outputFiles
which is pretty simple.
But at some level, we're asking outputFiles
to do two different things: one is to output a file given its name, the other is to process a list. It would be nice to factor this, so that
outputFile :: FilePath -> IO ()
outputFile path = readFile path >>= putStr
and then to use outputFile
, and then to rely on standard functions to process the list. The standard map
function almost works, except that map outputFile :: [IO ()]
, which isn't the type we're looking for. There's a nice function mapM :: Monad m => (a -> m b) -> [a] -> m [b]
(the actual type is slightly more general) that can be thought of as a monadic version of map
. The following doesn't quite work:
main = do
args <- getArgs
mapM outputFile args
But only because the last line has type [()]
rather than ()
. We can add pure ()
, or use mapM_ :: Monad m => (a -> m b) -> [a] -> m ()
which does the job for us. Finally, we can do our usual trick for eliminating the ephemeral variable args
, resulting in
main = getArgs >>= mapM_ outputFile
Which is remarkably terse.
Handle Based IO
The preponderance of your IO needs can be handled with the high-level IO functions and concepts we've seen so far. This is quite different from other languages, in which IO is typically done via lower-level interfaces. Such interfaces exist for Haskell, and you should be aware of them, as they are sometimes essential.
A basic concept is that of a Handle
, which is a Haskell type that represents a file, or file like-object (e.g., one of the Standard IO streams). There are a large number of functions for handle-based IO. Some of these are simply handle-based versions of functions we've seen before, e.g.,
hGetContents, hGetLine :: Handle -> IO String
hPutStr, hPutStrLn :: Handle -> String -> IO ()
In addition to these functions, there are constants that represent the three Standard IO streams:
stdin, stdout, stderr :: Handle
We can create a Handle
associated with a file and a particular IO mode by using:
data IOMode
= ReadMode
| WriteMode
| AppendMode
| ReadWriteMode
openFile :: FilePath -> IOMode -> IO Handle
Once we're done with a handle, it should be closed using
hClose :: Handle -> IO ()
Closing a file will write out the buffer, and release the kernel resources associated with the file. There are typically finite limits on how many files can be opened at any one time, both on a per-process and per-machine basis. These limits used to be small, but they're now quite large. Even so, it's better to develop the right programming disciplines from the beginning. One of the nice things about Haskell, and its IO monad, is that it's possible to write functions like:
withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
This is a file that opens a file obtaining a handle, performs the result of applying the action function to the handle, closes the handle, and then returns the result of the action. We don't need to implement withFile
, because it's already done in System.IO
, but it would be easy enough to write ourselves. We'll see a similar function a bit later in the lecture.
One reason to use handle-based IO is that there are no special-purpose functions for dealing with stderr
, although it's easy enough to just write them yourself:
putErrStr :: String -> IO ()
putErrStr = hPutStr stderr
Another reason is that you want control over the buffering choices that are being made. We've already seen hFlush :: Handle -> IO ()
used to force out buffered output. But we'll also often want finer control of stdin
. The System.IO
module defines:
data BufferMode
= NoBuffering
| LineBuffering
| BlockBuffering (Maybe Int)
In NoBuffering
mode, input is made available immediate, and output is written immediately. Thus, for example, you might want NoBuffering
if you're writing an interactive game, and want access to player input in real time. Alternatively, LineBuffering
waits until a full line of input is available, and so makes the usual line-editing functionality available. On output, LineBuffering
buffers until a line-feed character, whereas BlockBuffering
allows for larger buffers, and so offers greater performance, but isn't suitable for interactive use.
Buffering is manipulated using the functions:
hGetBuffering :: Handle -> IO Buffering
hSetBuffering :: Handle -> BufferMode -> IO ()
It is often convenient to have "stack-oriented" functions to manage resources, and we can think of the buffering mode as a kind of resource that we want to acquire and release. To that end,
withBuffering :: Handle -> BufferMode -> IO a -> IO a
withBuffering handle mode action = do
savedMode <- hGetBuffering handle
hSetBuffering handle mode
result <- action
hSetBuffering handle savedMode
pure result
is often very useful. This can be especially important when using the interpreter to debug your code. It's very frustrating to exit the program you're running, only to find that the interpreter is in an unexpected buffering mode.
Another reason for dealing with handle-based IO is to deal with a feature of Haskell that is sometimes a bug 😟. Some of Haskell's IO functions, notably getContents
, hGetContents
, readFile
are implicitly lazy, i.e., they return immediately with a String
, but the String
is built and IO performed on an as-used basis. This can be really great: often very naïve programs will be able to process huge files without using much memory. But it can be a problem in that IO is not as atomic as might be wished. This might be an issue, e.g., if you read a preferences file, and write it back out. If the writing is happening concurrently with the reading, the file can easily get corrupted, resulting in hard to diagnose crashes. Lower-level routines (even if that only means using getLine
and friends can avoid this problem.
A Supplemental Lecture
In our lectures, we often focus on programming ideas and techniques, but not on programs per se. It's sometimes nice to see the programming ideas applied in more substantial examples than we can work out in class. In a supplemental lecture, we develop a complete program for the Animal Game.