Lecture 15: Monads: IO II
Files and Handles
Of course, there's more to IO than user interaction. There are also files on disk, network connections, etc.
A basic abstraction is the notion of a Handle
—this is a value that that you can either read from, or write to. You typically obtain handles with calls to openFile
, and then use “h” versions of the standard input functions, with a handle argument, to do IO. Here's a simple example.
module Main where
import Data.Char
import System.Environment
import System.Exit
import System.IO
mUntil :: Monad m => m Bool -> m b -> m ()
mUntil control action = do
terminate <- control
if terminate
then return ()
else do
action
mUntil control action
usage :: IO ()
usage = do
progname <- getProgName
hPutStrLn stderr $ "usage: " ++ progname ++ " infile outfile"
exitWith $ ExitFailure 255
main :: IO ()
main = do
args <- getArgs
case args of
[infile,outfile] -> do
input <- openFile infile ReadMode
output <- openFile outfile WriteMode
mUntil (hIsEOF input) $ do
inChar <- hGetChar input
let outChar = toUpper inChar
hPutChar output outChar
hClose output
hClose input
_ -> usage
This is the AOLify program as it might be written by someone who learned to program in C, and is still thinking in C. (There's an old programming aphorism that "You can write Fortran in any language," and this code illustrates the point.) We're doing character by character IO, and using the unlifted toUpper
function to map input characters to output characters. Obviously, we're going to be working on this code, introducing code improvements, and maybe even yielding a bit to the temptation to make it more useful (as if AOLification can ever be useful!) along the way.
We introduce IO functions:
openFile :: FilePath -> IOMode -> IO Handle
, which obtains aHandle
from aFilePath
. We use aHandle
as an argument to other IO functions for reading or writing to that file. Note that under certain circumstances (e.g., we try to open an non-existent file for reading) this can raise an exception. There is also anopenBinaryFile
used to open binary files. Windows handles binary files differently from text files—the rest of the world doesn't. We could wish that Windows didn’t have this quirk, and we could wish that somehow Haskell swizzled the distinction in the call toopenFile
, but they don’t. But we can save ourselves from trying to figure out what sort of a system we're on by usingopenBinaryFile
whenever we’re opening a binary file, which will do the right thing irrespective of host OS.hGetChar :: Handle -> IO Char
andhPutChar :: Handle -> Char -> IO ()
, which reads/writes a character to/from a stream (a generalized file) via aHandle
.hClose :: Handle -> IO ()
, which closes an openHandle
. We do this because there are per-process, per-system, and sometimes per-user limits on the number of open files. A good practice, when possible, is a stack-oriented strategy for resource use, wherein resources are freed in the same block in which they're allocated, in reverse order. The system will close any open files associated with a process when it closes, but it’s still good style to close the files ourselves.
Note too that, to improve the correspondence with natural C code, I've written a function mUntil
, which takes a “test” monad of type m Bool
to control iteration of an effectful action. I’ve just introduced ordinary, C-style iteration to Haskell, with what is essentially a while
loop with an inverted test for tactical convenience. It’s a bit disconcerting that I have to write mUntil
myself, but revealing of the power of Haskell’s abstractions that I can. Remember that in C, you’re constrained to the control structures that Kernighan and Richie thought you needed, and you don’t as a mere programmer have the power to write your own.
Let’s begin our work of code cleaning by tackling mUntil
. I find the construct
do
terminate <- control
if terminate
...
to be a bit annoying. What we’re doing conceptually is very simple: extracting a value from a monad, and dispatching on it. The same pattern often happens with case
, e.g., in the body of main
:
do
args <- getArgs
case args of
...
It would be nice to have alternative syntaxes, mif
and mcase
to deal with this, eliminating the need for the intermediate variable, and collapsing three lines of syntax to two, and often just to one in the common case where these are the only two statements in the do
body. Well, we don’t have mif
or mcase
, and we can’t get rid of the intermediate variable, but we can do the syntactic shortening by de-sugaring and simply doing a bit more on a line:
mUntil :: Monad m => m Bool -> m b -> m ()
mUntil control action = control >>= \terminate -> if terminate
then return ()
else action >> mUntil control action
And then, since we expect that these loops may iterate a large number of times, we can save a bit of overhead in marshaling arguments by abstracting out the call itself:
mUntil :: Monad m => m Bool -> m b -> m ()
mUntil control action = loop where
loop = control >>= \terminate -> if terminate
then return ()
else action >> loop
Even better would be to use Control.Monad.unless
:
mUntil control action = loop where
loop = control >>= \terminate -> unless terminate (action >> loop)
At this point, if we don't give into temptation, and η-reduce away the λ, we'll have to live with it, so...
mUntil control action = loop where
loop = control >>= flip until (action >> loop)
I'll (Stuart) will note here that this process can be driven farther, getting the definition down to a one-liner in which both control
and action
have been η-reduced away too, and Ravi suggested that I included it as a "Halloween prank" to scare you guys, but I'm not quite that mean.
*Exercise 15.1 The natural C code has one amusing quirk, that's worth noting and reflecting on. The natural C program's IO loop would be written like this:
int ch;
while ((ch = fgetc(input)) != EOF) {
fputc(toupper(ch),output);
}
rather than this:
while (!feof(input)) {
int ch = fgetc(input);
fputc(toupper(ch),output);
}
There's something kind of sneaky going on here, which is revealed by the fact that the idiomatic code uses an int
, a four-byte integer, to hold a char
, a one byte quantity. And the sneakiness is that fgetc
will return the non-character value EOF
if it's at the end of file, and an ordinary character otherwise. Note here that there's no call to feof
, we're relying on fgetc
to indicate that we're at the end-of-file by returning the non-character EOF
. But haven't we seen this before? Isn't this just a C idiom for creating the type Maybe Char
? I think it is.
So let's see if we take push this a bit further.
- Write a Haskell function
hMaybeGetChar :: Handle -> IO (Maybe Char)
that returnsNothing
at end of file, andJust c
if it readsc
. - Write a Haskell function
mWhileJust :: Monad m => m (Maybe a) -> (a -> m b) -> m ()
which uses its first argument as a test to control the iteration of its second argument.
If you do this right, you'll be able to replace the mUntil
code with:
mWhileJust (hMaybeGetChar input) $ \c -> hPutChar output (toUpper c)
or even
mWhileJust (hMaybeGetChar input) $ toUpper >>> hPutChar output
Next up, let's take a look at the crucial do
block:
do
input <- openFile infile ReadMode
output <- openFile outfile WriteMode
mUntil (hIsEOF input) $ do
inChar <- hGetChar input
let outChar = toUpper inChar
hPutChar output outChar
hClose output
hClose input
A first order of business is to clean up the do
block that is controlled by mUntil
. Using the techniques we introduced in the frog program, this can be re-written:
mUntil (hIsEOF input) $ hGetChar input >>= (toUpper >>> hPutChar output)
eliminating two local variables and collapsing four lines of code into one, although the result is long enough that we may need to break it at the ($)
. What's left is code that's dominated by the acquisition and release of file handles. Can this be made cleaner? Not so much in C, but Python programmers might recognize an opportunity to use a with
block, which automates stack-oriented release of allocated resources, exactly what we're trying to accomplish here. Not surprisingly, we can do the same in Haskell through the use of a predefined function withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
, and re-write the main block as:
withFile infile ReadMode $ \input ->
withFile outfile WriteMode $ \output ->
mUntil (hIsEOF input) $
hGetChar input >>= (toUpper >>> hPutChar output)
Exercise 15.2 Do your own implementation of withFile
. Think about how Haskell facilitates writing your own control structures, as we've now done with mUntil
, mWithJust
, and withFile
, and how monadic values can function as “freeze-dried” computations, much as λ-terms do. Then take a look at System.IO
's implementation of withFile
.
Next up, having played with this code as much as we have, we might decide that we'd like to make it more useful. Yeah, I know, this is a suspend disbelief moment, but stick with me. If we have a program for aolifying files, it might be nice to generalize it to one that handles streams, so that we can use it in shell pipelines. There are two natural approaches, both of which get used in real-world UNIX programs. The first is to recode our program as a pure filter, but this makes life way too easy:
module Main where
import Data.Char
main :: IO ()
main = interact $ map toUpper
The second is to allow "-"
as a command line argument, signifying either the standard input stream stdin
, or the standard output stream stdout
. The question is how best to trap these cases in our code, and how to avoid doing something we might regret later, like closing stdout
. A natural place to attack this is at the calls to withFile
, since these calls have both the information and the ability to do what's need. So we'll patch withFile
to add the new functionality, resulting in withDefaultFile
:
withDefaultFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
withDefaultFile "-" ReadMode actionf = actionf stdin
withDefaultFile "-" WriteMode actionf = actionf stdout
withDefaultFile file mode actionf = withFile file mode actionf
This implementation illustrates a general theme in patching system calls, irrespective of language, and that is that it's often best to take a parsimonious attitude, relying on the unpatched call in cases that it handles to our satisfaction. In programming as in life, you need to take on responsibility to accomplish the things you need/want too, but taking on more responsibility than that can become an unnecessary burden. One more tweak is to η-reduce, which is obvious in the last alternative, and less so in the first two:
withDefaultFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
withDefaultFile "-" ReadMode = ($ stdin)
withDefaultFile "-" WriteMode = ($ stdout)
withDefaultFile file mode = withFile file mode
Note the sneaky use of a section with ($)
. This is a moderately common idiom, and should be understood.
But it's worth noting that for all the good work we've done, this is still a C program that's been translated into Haskell. Let's start losing the accent. The first place to begin is to note that no self-respecting Haskell programming is going to approach IO character-by-character except under extreme duress. There are IO functions hGetContents :: Handle -> IO String
that will return the entire contents of a handle into a String
, and hPutStr :: Handle -> String -> IO ()
for writing a string via a handle. Using them simplifies our copy code to:
withDefaultFile infile ReadMode $ \input ->
withDefaultFile outfile WriteMode $ \output ->
hGetContents input >>= (map toUpper >>> hPutStr output)
eliminating the need for mUntil
and its definition. Moreover, there are standard functions readFile :: FilePath -> IO String
and writeFile :: FilePath -> String -> IO ()
that handle opening, reading, and closing a file in a very simple interface. If we're willing to forego our special handling of "-"
as a command line argument, this can be simplified all the way to
readFile input >>= (map toUpper >>> writeFile output)
Exercise 15.3 Write versions of readFile
and writeFile
, called readDefaultFile
and writeDefaultFile
respectively, that preserve the special handling of "-"
.
There is something that's both clever and a bit tricky going on here, and it's worth understanding. For the most part, IO actions in Haskell are fully completed when they are performed. For the most part. But there are a few exceptions, notably getContents
, hGetContents
, and readFile
which are actually performed lazily, i.e., performing the action doesn't result immediately in IO, but instead in a promise to do IO later as the String
produced gets consumed. These IO promises also promise to close the input stream, if appropriate, once all of the data associated with that stream has been produced. This is done because it permits many simple Haskell programs to run in constant space that otherwise wouldn't, this program included.
The trickiness comes if you manually close such a stream, which will cut off input at whatever's already been read, and almost certainly isn't what you intended.