Lecture 15: Monads: IO II

Files and Handles

Of course, there's more to IO than user interaction. There are also files on disk, network connections, etc.

A basic abstraction is the notion of a Handle—this is a value that that you can either read from, or write to. You typically obtain handles with calls to openFile, and then use “h” versions of the standard input functions, with a handle argument, to do IO. Here's a simple example.


    module Main where
    
    import Data.Char
    import System.Environment
    import System.Exit
    import System.IO
    
    mUntil :: Monad m => m Bool -> m b -> m ()
    mUntil control action = do
        terminate <- control
        if terminate
        then return ()
        else do
            action
            mUntil control action
    
    usage :: IO ()
    usage = do
        progname <- getProgName
        hPutStrLn stderr $ "usage: " ++ progname ++ " infile outfile"
        exitWith $ ExitFailure 255
    
    main :: IO ()
    main = do
        args <- getArgs
        case args of
            [infile,outfile] -> do
                input <- openFile infile ReadMode
                output <- openFile outfile WriteMode
                mUntil (hIsEOF input) $ do
                    inChar <- hGetChar input
                    let outChar = toUpper inChar
                    hPutChar output outChar
                hClose output
                hClose input
            _ -> usage

This is the AOLify program as it might be written by someone who learned to program in C, and is still thinking in C. (There's an old programming aphorism that "You can write Fortran in any language," and this code illustrates the point.) We're doing character by character IO, and using the unlifted toUpper function to map input characters to output characters. Obviously, we're going to be working on this code, introducing code improvements, and maybe even yielding a bit to the temptation to make it more useful (as if AOLification can ever be useful!) along the way.

We introduce IO functions:

openFile :: FilePath -> IOMode -> IO Handle, which obtains a Handle from a FilePath. We use a Handle as an argument to other IO functions for reading or writing to that file. Note that under certain circumstances (e.g., we try to open an non-existent file for reading) this can raise an exception. There is also an openBinaryFile used to open binary files. Windows handles binary files differently from text files—the rest of the world doesn't. We could wish that Windows didn’t have this quirk, and we could wish that somehow Haskell swizzled the distinction in the call to openFile, but they don’t. But we can save ourselves from trying to figure out what sort of a system we're on by using openBinaryFile whenever we’re opening a binary file, which will do the right thing irrespective of host OS.
hGetChar :: Handle -> IO Char and hPutChar :: Handle -> Char -> IO (), which reads/writes a character to/from a stream (a generalized file) via a Handle.
hClose :: Handle -> IO (), which closes an open Handle. We do this because there are per-process, per-system, and sometimes per-user limits on the number of open files. A good practice, when possible, is a stack-oriented strategy for resource use, wherein resources are freed in the same block in which they're allocated, in reverse order. The system will close any open files associated with a process when it closes, but it’s still good style to close the files ourselves.

Note too that, to improve the correspondence with natural C code, I've written a function mUntil, which takes a “test” monad of type m Bool to control iteration of an effectful action. I’ve just introduced ordinary, C-style iteration to Haskell, with what is essentially a while loop with an inverted test for tactical convenience. It’s a bit disconcerting that I have to write mUntil myself, but revealing of the power of Haskell’s abstractions that I can. Remember that in C, you’re constrained to the control structures that Kernighan and Richie thought you needed, and you don’t as a mere programmer have the power to write your own.

Let’s begin our work of code cleaning by tackling mUntil. I find the construct


    do
        terminate <- control
        if terminate
            ...

to be a bit annoying. What we’re doing conceptually is very simple: extracting a value from a monad, and dispatching on it. The same pattern often happens with case, e.g., in the body of main:


    do
        args <- getArgs
        case args of
            ...

It would be nice to have alternative syntaxes, mif and mcase to deal with this, eliminating the need for the intermediate variable, and collapsing three lines of syntax to two, and often just to one in the common case where these are the only two statements in the do body. Well, we don’t have mif or mcase, and we can’t get rid of the intermediate variable, but we can do the syntactic shortening by de-sugaring and simply doing a bit more on a line:


    mUntil :: Monad m => m Bool -> m b -> m ()
    mUntil control action = control >>= \terminate -> if terminate
        then return ()
        else action >> mUntil control action

And then, since we expect that these loops may iterate a large number of times, we can save a bit of overhead in marshaling arguments by abstracting out the call itself:


    mUntil :: Monad m => m Bool -> m b -> m ()
    mUntil control action = loop where
        loop = control >>= \terminate -> if terminate
            then return ()
            else action >> loop

Even better would be to use Control.Monad.unless:


    mUntil control action = loop where
        loop = control >>= \terminate -> unless terminate (action >> loop)

At this point, if we don't give into temptation, and η-reduce away the λ, we'll have to live with it, so...


    mUntil control action = loop where
        loop = control >>= flip until (action >> loop)

I'll (Stuart) will note here that this process can be driven farther, getting the definition down to a one-liner in which both control and action have been η-reduced away too, and Ravi suggested that I included it as a "Halloween prank" to scare you guys, but I'm not quite that mean.

*Exercise 15.1 The natural C code has one amusing quirk, that's worth noting and reflecting on. The natural C program's IO loop would be written like this:


    int ch;
    while ((ch = fgetc(input)) != EOF) {
        fputc(toupper(ch),output);
    }

rather than this:


    while (!feof(input)) {
        int ch = fgetc(input);
        fputc(toupper(ch),output);
    }

There's something kind of sneaky going on here, which is revealed by the fact that the idiomatic code uses an int, a four-byte integer, to hold a char, a one byte quantity. And the sneakiness is that fgetc will return the non-character value EOF if it's at the end of file, and an ordinary character otherwise. Note here that there's no call to feof, we're relying on fgetc to indicate that we're at the end-of-file by returning the non-character EOF. But haven't we seen this before? Isn't this just a C idiom for creating the type Maybe Char? I think it is.

So let's see if we take push this a bit further.

Write a Haskell function hMaybeGetChar :: Handle -> IO (Maybe Char) that returns Nothing at end of file, and Just c if it reads c.
Write a Haskell function mWhileJust :: Monad m => m (Maybe a) -> (a -> m b) -> m () which uses its first argument as a test to control the iteration of its second argument.

If you do this right, you'll be able to replace the mUntil code with:


    mWhileJust (hMaybeGetChar input) $ \c -> hPutChar output (toUpper c)

or even


    mWhileJust (hMaybeGetChar input) $ toUpper >>> hPutChar output

Next up, let's take a look at the crucial do block:


    do
        input <- openFile infile ReadMode
        output <- openFile outfile WriteMode
        mUntil (hIsEOF input) $ do
            inChar <- hGetChar input
            let outChar = toUpper inChar
            hPutChar output outChar
        hClose output
        hClose input

A first order of business is to clean up the do block that is controlled by mUntil. Using the techniques we introduced in the frog program, this can be re-written:


    mUntil (hIsEOF input) $ hGetChar input >>= (toUpper >>> hPutChar output)

eliminating two local variables and collapsing four lines of code into one, although the result is long enough that we may need to break it at the ($). What's left is code that's dominated by the acquisition and release of file handles. Can this be made cleaner? Not so much in C, but Python programmers might recognize an opportunity to use a with block, which automates stack-oriented release of allocated resources, exactly what we're trying to accomplish here. Not surprisingly, we can do the same in Haskell through the use of a predefined function withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r, and re-write the main block as:


    withFile infile ReadMode $ \input ->
        withFile outfile WriteMode $ \output ->
            mUntil (hIsEOF input) $
                hGetChar input >>= (toUpper >>> hPutChar output)

Exercise 15.2 Do your own implementation of withFile. Think about how Haskell facilitates writing your own control structures, as we've now done with mUntil, mWithJust, and withFile, and how monadic values can function as “freeze-dried” computations, much as λ-terms do. Then take a look at System.IO's implementation of withFile.

Next up, having played with this code as much as we have, we might decide that we'd like to make it more useful. Yeah, I know, this is a suspend disbelief moment, but stick with me. If we have a program for aolifying files, it might be nice to generalize it to one that handles streams, so that we can use it in shell pipelines. There are two natural approaches, both of which get used in real-world UNIX programs. The first is to recode our program as a pure filter, but this makes life way too easy:


    module Main where
    
    import Data.Char

    main :: IO ()
    main = interact $ map toUpper

The second is to allow "-" as a command line argument, signifying either the standard input stream stdin, or the standard output stream stdout. The question is how best to trap these cases in our code, and how to avoid doing something we might regret later, like closing stdout. A natural place to attack this is at the calls to withFile, since these calls have both the information and the ability to do what's need. So we'll patch withFile to add the new functionality, resulting in withDefaultFile:


    withDefaultFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
    withDefaultFile "-" ReadMode actionf = actionf stdin
    withDefaultFile "-" WriteMode actionf = actionf stdout
    withDefaultFile file mode actionf = withFile file mode actionf

This implementation illustrates a general theme in patching system calls, irrespective of language, and that is that it's often best to take a parsimonious attitude, relying on the unpatched call in cases that it handles to our satisfaction. In programming as in life, you need to take on responsibility to accomplish the things you need/want too, but taking on more responsibility than that can become an unnecessary burden. One more tweak is to η-reduce, which is obvious in the last alternative, and less so in the first two:


    withDefaultFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
    withDefaultFile "-" ReadMode = ($ stdin)
    withDefaultFile "-" WriteMode = ($ stdout)
    withDefaultFile file mode = withFile file mode

Note the sneaky use of a section with ($). This is a moderately common idiom, and should be understood.

But it's worth noting that for all the good work we've done, this is still a C program that's been translated into Haskell. Let's start losing the accent. The first place to begin is to note that no self-respecting Haskell programming is going to approach IO character-by-character except under extreme duress. There are IO functions hGetContents :: Handle -> IO String that will return the entire contents of a handle into a String, and hPutStr :: Handle -> String -> IO () for writing a string via a handle. Using them simplifies our copy code to:


    withDefaultFile infile ReadMode $ \input ->
        withDefaultFile outfile WriteMode $ \output ->
            hGetContents input >>= (map toUpper >>> hPutStr output)

eliminating the need for mUntil and its definition. Moreover, there are standard functions readFile :: FilePath -> IO String and writeFile :: FilePath -> String -> IO () that handle opening, reading, and closing a file in a very simple interface. If we're willing to forego our special handling of "-" as a command line argument, this can be simplified all the way to


    readFile input >>= (map toUpper >>> writeFile output)

Exercise 15.3 Write versions of readFile and writeFile, called readDefaultFile and writeDefaultFile respectively, that preserve the special handling of "-".

There is something that's both clever and a bit tricky going on here, and it's worth understanding. For the most part, IO actions in Haskell are fully completed when they are performed. For the most part. But there are a few exceptions, notably getContents, hGetContents, and readFile which are actually performed lazily, i.e., performing the action doesn't result immediately in IO, but instead in a promise to do IO later as the String produced gets consumed. These IO promises also promise to close the input stream, if appropriate, once all of the data associated with that stream has been produced. This is done because it permits many simple Haskell programs to run in constant space that otherwise wouldn't, this program included.

The trickiness comes if you manually close such a stream, which will cut off input at whatever's already been read, and almost certainly isn't what you intended.

CMSC-16100

Honors Introduction to Programming, I

Autumn Quarter, 2015

Lecture 15: Monads: IO II

Files and Handles