Lecture 8: A Brief Introduction to Haskell I/O

So far, we've lived in the interpreter, and built little snippets of code. This is useful, but it limits the mode of interaction. Programs intended for end-users (often ourselves) often want more control over how they interact with the user. Moreover, programs intended for end users present themselves as complete—we don't want Grandma to have to install ghc and master Haskell to enjoy the fruits of our labors this quarter.

This will, of present necessity, be a very incomplete introduction. A thorough understanding of Haskell I/O will come later. Mimicry and practice though can form a foundation for later understanding, so I'm asking you to suspend disbelief for a bit. It's time to get on the bike, to start pedaling, and to believe that when Dad lets go, you'll keep going.

Output

So let's start with the old "Hello, world!" chestnut:

module Main where main :: IO () main = do putStrLn "Hello, world!"

We'll ignore the actual content of the file for just a bit. Let's suppose we put this in a file called hello.hs. We can produce an executable (binary) file by compiling this using ghc (not ghci):

$ ghc hello.hs [1 of 1] Compiling Main ( Hello.hs, Hello.o ) Linking hello ... $ ./hello Hello, world! $

If we're clever enough to have a ~/bin directory, and to have it on our PATH, we can simplify this further:

$ cp hello ~/bin/hello $ hello Hello, world! $

This is good enough for simple programs, but more complicated situations (including where there's non-trivial configuration and/or testing involved) are better handled through cabal (or stack), as you're learning in the lab.

There's a fair bit to explain here, and actually a fair bit that isn't necessary for this program, but will be essential soon enough.

We'll start at the top. Haskell programs are typically divided into modules. A module is a related collection of declarations and definitions. Modules have simple alphanumeric names, and may be structured hierarchically, using the period (.) symbol as a separator. The declaration

module Main where

indicates that the code in this file will be in the Main module. Evaluation of compiled code is driven by performing the IO action main from the Main module.

Next, we have the type declaration main :: IO (). This looks odd, a bit of advanced technology indistinguishable from magic. It will seem less magical in time. Types of the form IO a are IO actions, which when performed return a value of type a. The type () is Haskell's Unit type, which contains a single defined value, also denoted by (). We use () as a simple, concrete, placeholder in situations where Haskell requires a type, but we don't aren't going to do anything subsequently that discriminates between values of that type.

The definition of main consists of a do construct, which is used to combine a sequence of IO actions into a single IO action. In this case, there is only one action, so we could get by without the do-wrapping, i.e.

main = putStrLn "Hello, world!"

but that doesn't generalize to the more complicated examples we're going to see soon, and it's often useful to use a do to sequence a single IO action, when we expect that we may be adding other IO actions to the sequence later.

Finally, putStrLn :: String -> IO () is a function that takes a String as an argument, and produces as a value an IO action, which when performed prints its argument to the standard output. Note here that we've been carefully using separate words: an expression may have a value in IO a, but evaluation doesn't cause an IO action to be performed. Only performing it does. The following code may make the distinction a bit clearer:

module Main where naNaNaNa :: IO () naNaNaNa = putStrLn "Na, na, na, na" heyHeyGoodbye :: IO () heyHeyGoodbye = putStrLn "Hey, hey, Goodbye!" main :: IO () main = do naNaNaNa naNaNaNa heyHeyGoodbye

We define several IO actions here, notably naNaNaNa and heyHeyGoodbye. Defining these values doesn't cause the IO actions they describe to be performed, but when main itself is performed, they in turn are performed. This seems clear enough when these IO actions are defined globally, but the same holds true if they are defined locally, e.g.,

module Main where main :: IO () main = do let naNaNaNa = putStrLn "Na, na, na, na" heyHeyGoodbye = putStrLn "Hey, hey, Goodbye!" naNaNaNa naNaNaNa heyHeyGoodbye

The output is as before:

$ ./goodbye Na, na, na, na Na, na, na, na Hey, hey, Goodbye! $

Defining is not performing. Performing is performing. Note here that top-level let bindings within a do have a scope that consists of the binding (allowing mutually recursive definitions) and the rest of the do body, so the keyword in is not used, and a level of indentation is saved. Note (as here) that a single let may be used to define multiple names. This may ring a bell—we don't use the in keyword when binding values in the interprester. This isn't different: for all practical purposes, the interpreter's read loop is the body of a do block in the IO context.

Input

We've learned how to use putStrLn to produce output, and you'll not be surprised to learn that there are many more output-oriented functions in Haskell, or that we'll encounter some of them later, but putStrLn is enough to get us started on output. But what about input?

The complement to putStrLn is getLine :: IO String, a function that reads a line of text from standard input (for now, the terminal), up to the next newline or the EOF (end-of-file), and returns a String value (conveniently omitting the newline).

module Main where main :: IO () main = do putStrLn "Hello. I am a HAL 9000 series computer." putStrLn "Who are you?" name <- getLine putStrLn ("Good morning, " ++ name ++ ".")

Note the binding syntax here, in which an IO action is performed, and the value it returns is bound to a variable (in this case, name). People learning Haskell often struggle at first with the distinction between let and <-, in that both bind names, and so seem to do similar things. The difference is that with a let, the defining expressing is evaluated, and the name is bound to the resulting value; whereas, with a binding, the expression is performed, and the name is bound to the result returned by the action.

One bit of trickiness is in the final line, where we concatenate several strings together using the (++) operator, and apply putStrLn to the result. It's good Haskell style to omit unnecessary parenthesis, and this often tempts beginners into dropping them from the last line,

putStrLn "Good morning, " ++ name ++ "."

This unfortunately doesn't work, because function application binds more tightly than application, so the syntax says to apply the function putStrLn to the string "Good morning, ", and then to use (++) to combine the result (of type IO ()) with name, which has type String. The resulting error message says this, but it's a bit intimidating at first. Experienced Haskell programmers will often use the ($) operation here, which changes both precedence and associativity:

putStrLn $ "Good morning, " ++ name ++ "."

This program runs pretty much as any Space Odyssey aficionado would anticipate:

$ ./hal Hello. I am a HAL 9000 series computer. Who are you? Dave Good morning, Dave. $

Of course, input is available from places other than standard input, e.g., the command line, files, network sockets, and the environment. The latter is a simple key-value list associated with each process, and is typically used for communication between processes. One of the environment variables is USER, which is initialized by the log-in process to contain the account name, often the user's personal name. We can easily rewrite this program so that it uses the USER environment variable, rather than interrogating the user:

module Main where import System.Environment main :: IO () main = do putStrLn "Hello. I am a HAL 9000 series computer." name <- getEnv "USER" putStrLn $ "Good morning, " ++ name ++ "."

Here, we've used the function getEnv :: String -> IO String, which takes a key as argument, and when performed returns the corresponding value. An addition bit of complexity comes from the fact that getEnv isn't exported by the standard Haskell Prelude, but instead is exported by the System.Environment module, which we import here.

$ ./hal Hello. I am a HAL 9000 series computer. Good morning, stuart. $

OK, this is hitting a little too close to home.

*Exercise 8.1 Modify the second (getEnv-based) hal program so that it capitalizes the user's name in greeting them. Compile and run your program, and provide a sample interaction. You may find the function Data.Char.toUpper to be helpful.

Before moving on, note that Windows, as is so often the case, gratuitously varies from the standards it co-opts, and uses USERNAME instead. Caveat emptor. We could easily re-write this example so that it worked with Windows, but we might like to write a portable version that can deal with either convention. This turns out to be more difficult than we might expect, and we'll need to develop some new tools first.

The Read and Show type classes.

So far, we've dealt with simple problems of IO, in particular, sending strings to standard output, and retrieving them from standard input. But Haskell is a type-strict language, and this raises the question of how do we get other kinds of information in and out of our program. To that end, Haskell provides two very useful and flexible type classes: instances s of Show support a show :: a -> String function, while instances r of Read support read :: String -> a. Most of the types exported by the Prelude have both Read and Show instances, and Haskell supports derived instances of both, using the same syntax as Haskell programs.

Consider a simple program that generates binomial coefficients:

module Main where import System.Environment binomial :: Int -> [Integer] binomial n | n > 0 = let bs = binomial (n-1) in zipWith (+) ([0]++bs) (bs++[0]) | n == 0 = [1] | otherwise = error "domain error: negative argument to binomial" main :: IO () main = do [nstr] <- getArgs putStrLn . unwords . map show . binomial . read $ nstr

The particular algorithm whereby we compute lists of binomial coefficients isn't important here—but if you're familiar with Pascal's triangle, you should be able to see it here. Our current concern is the content of main, which includes IO actions, read, and show.

We start with

[nstr] <- getArgs

Which reads the command line arguments (not including the program name), and assuming that there is just one, binds it to the variable nstr :: String. The composition

putStrLn . unwords . map show . binomial . read

uses read to convert a String to an Int, then uses the binomial function to produce an [Integer]. We map show across this, resulting in an [String]. These strings are concatenated together, with separating spaces, using the Prelude function unwords :: [String] -> String, and finally, the resulting string is the argument to putStrLn, creating an IO (), which when performed, produced the desired output:

$ ./binomial 6 1 6 15 20 15 6 1 $

Let's focus first on the wonderful and seemly mysterious read :: Read a => String -> a. Students familiar with other programming languages and their conventions might be a bit perplexed here. How does the compiler know what type read is supposed to return, if it always takes the same type of argument? But this involves projecting an understanding of overloading and type resolution (as is done in more traditional programming languages like C++ and Java) onto Haskell, and that's a mistake. Haskell's type inference can consider the return type, as well as the argument types, and since we're composing read :: Read s => String -> s with binomial :: Int -> Integer, the fully grounded type of read in this context must be String -> Int, and this allows Haskell to select the correct instance of Read in binding read. Trust me, you'll come to rely on this, and you'll miss it in other languages.

Note here that our error-handling strategy is naïve, and there are lots of things that can go wrong at runtime: there might be too many command line arguments, or too few. The argument may not have the format of an Integer, or it might have the format of a negative integer. We'll learn how to avoid and/or handle such errors later, but for now, they'll raise an uncaught exception, and terminate the program with an error. Still, it's better to crash and burn visibly than to silently produce nonsense.

Best Coding Practice and Complex Contexts

The binomial program introduces an important idea. IO code is different from ordinary, “pure” code: it is more difficult to reason about, and more difficult to test. Therefore, we want to structure our code so as to move complexity out of IO code. Introducing binomial as a separate function, outside of the IO context, renders the hard core of the program in pure code, simplifying reasoning, testing, and debugging.

Part of the distinctive character of programming in Haskell comes from this split, and it really is a good-news/bad-news deal for the programmer. The good news is that you can really put your algebraic thinking hat on, and perform some nifty transformations that increase readability, concision, and efficiency, with confidence, when dealing with pure code. The bad news is that there are times when a real-world programmer wants to sprinkle some IO in the middle of what is otherwise pure code, e.g., to facilitate "wolf-fence" debugging. The division between pure and impure code in Haskell makes it impossible to do this without dragging the code in question into the impure world, conceding the very advantages we worked so hard to obtain.

As is so often the case in Haskell, IO contexts are a special case of a much more general phenomenon. We can imagine the existence of other kinds of other contexts m, and the general problem of writing functions of the form g :: m a -> m b. In such circumstances, it is often possible to write a function f :: a -> b, and then to somehow lift f into a m's context. Lest this seem hopeless abstract, review our binomial program again, where the function binomial is pure, and is subsequently employed at the core of main, which lives in the IO context. This exemplifies a simple, yet profound, engineering reality: the easiest and most elegant way to build reliable systems is by composing reliable components.

Haskell encourages this kind of factoring, and Haskell programmers look for opportunities to refactor their code as much as possible along a pure/lift axis. Look for it. Expect it. Do it.