Lecture 9: Rot-13

Our next program is a simple filter, not in the Haskell sense, but in the Unix sense of a program that reads from standard input, and writes to standard output.

In the days before Facebook and blogs, there was a distributed netnews system that consisted of a large collection of newsgroups. The most popular of these newsgroups, was “rec.humor.funny.” This was a moderated news group, which sent out a few jokes every day, most of which were indeed funny. But humor is a tricky thing, and some humor can cause offense. So the moderator adopted a simple strategy. Jokes that were potentially offensive were rot-13'd, i.e., the characters were remapped so that 'a' -> 'n', 'b' -> 'o', etc. People who wanted to read possibly offensive jokes would then rot-13 it again (since there are 26 letters, rotating twice by 13 takes you back to where you started). What made this work was that people were clearly warned that rot-13'ing was at your own risk. The social contract was that if you rot-13'ed, you surrendered the right to complain.

Let's start by thinking about this as a problem in pure code. How do we rot-13 a character? For the sake of simplicity, we'll assume that the underlying text is drawn from the ASCII subset of Unicode.

import Data.Char rotChar :: Char -> Char rotChar c | isLower c = chr (ord 'a' + (ord c - ord 'a' + 13) `mod` 26) | isUpper c = chr (ord 'A' + (ord c - ord 'A' + 13) `mod` 26) | otherwise = c

This code uses the functions chr :: Int -> Char and ord :: Char -> Int from Data.Char, which translate between Unicode characters and their associated Int code points, and relies on the fact that the each case of letters is associated with an ascending, sequential set of code points. This is true of ASCII, but is not true of all character encodings. One minor mystery in the code has to do with the relative precedences of + and `mod`. Intuitively, `mod` is related to division, and so might be assumed to have the same precedence as *, /, etc. This is indeed so, and can be checked using the :i (info) command in ghci:

Prelude> :i mod class (Real a, Enum a) => Integral a where ... mod :: a -> a -> a ... -- Defined in GHC.Real infixl 7 mod Prelude> :i (*) class Num a where ... (*) :: a -> a -> a ... -- Defined in `GHC.Num' infixl 7 * Prelude>

Here we see that mod, when used as an infix operator, associates to the left and at precedence level 7, which is exactly the same as * and its friends.

There are programmers (and sometimes I'm one) who would complain about the duplication of code in rotChar. We can easily eliminate the duplication by introducing a helper function, which is built out of the duplicated code, and abstracts out the part that varies:

rotChar :: Char -> Char rotChar c | isLower c = rotCase 'a' c | isUpper c = rotCase 'A' c | otherwise = c rotCase :: Char -> Char -> Char rotCase base char = chr (ord base + (ord char - ord base + 13) `mod` 26)

But our problem wasn't to rotate individual characters, it was to rotate the entire input String. We need to lift the function

rotChar :: Char -> Char

to the function

rot :: String -> String

This is ridiculously easy, remembering that String is a synonym for [Char].

rot :: String -> String rot = map rotChar

Indeed, we'll soon enough come to understand map in exactly these terms—as a particular instance of a generalized "lifting" function.

At this point, it makes sense to revisit our use of local definitions. The function that we care about is rot—the functions rotChar and rotCase are simply there to help us define rot. It makes sense to tidy our namespace up a bit, and encapsulate the definitions of these helper functions within the definition of rot:

rot :: String -> String rot = map rotChar where rotChar c | isLower c = rotCase 'a' c | isUpper c = rotCase 'A' c | otherwise = c rotCase base char = chr (ord base + (ord char - ord base + 13) `mod` 26)

Our full program is now

module Main where import Data.Char rot :: String -> String rot = map rotChar where rotChar c | isLower c = rotCase 'a' c | isUpper c = rotCase 'A' c | otherwise = c rotCase base c = chr (ord base + (ord c - ord base + 13) `mod` 26) main :: IO () main = do input <- getContents putStr $ rot input

The function getContents :: IO String is an IO action that packages the program's standard input stream as a Haskell String. The function putStr writes a String to standard output, but without a terminating newline. In this case, the desired terminating newline would have been present in the input stream, and would have survived our mapping, so there's no need to add another.

We can now test this, and what better input source than our source?!

$ ./rot < rot.hs zbqhyr Znva jurer vzcbeg Qngn.Pune ebg :: Fgevat -> Fgevat ebg = znc ebgPune jurer ebgPune p | vfYbjre p = ebgPnfr 'n' p | vfHccre p = ebgPnfr 'N' p | bgurejvfr = p ebgPnfr onfr p = pue (beq onfr + (beq p - beq onfr + 13) `zbq` 26) znva :: VB () znva = qb vachg <- trgPbagragf chgFge $ ebg vachg

Hopefully, this makes less sense to you than the cleartext version. Anyway, if we rot twice, we get back to where we started.

Let's play with this just a bit more. Suppose we want to implement other ciphers. For example, the Caesar cipher is rot 3, and would be decoded by rot 23 (or rot -3!).

To do this in a uniform way, we'll design our program so that if it is called without any command-line arguments, it does a standard rot-13, but if a single command-line argument is provided, it will be interpreted as an integer that gives the desired rotation. This exemplifies the design pattern of making the common case (rot-13) simple.

To do this, we'll use a case to pattern match on the result of performing getArgs:

module Main where import ... rot :: Int -> String -> String rot n = ... main :: IO () main = do args <- getArgs case args of [] -> ... [x] -> ... _ -> ...

If there are no arguments, we want to write read standard input as a string, rotate it by the default of 13, and write the rotated string to standard output:

[] -> do input <- getContents let output = rot 13 input putStr output

Let's step through this. Since we're doing IO, we know that the expressions in the do construct must be IO actions, so the alternatives of the case must also be IO actions. We'll usually need to combine several distinct IO actions (reading from standard input, writing to standard output) into a single IO action, hence the inner do's. We read standard input by the IO action getContents :: IO String, which when performed, returns the contents of standard input as a String. Next, we see the use of a let to bind the result of a pure computation. Finally, we write the resulting output string to standard out using putStr :: String -> IO ().

On to the second case... . A first cut at this might look like this:

[x] -> do input <- getContents let output = rot (read x) input putStr output

There's one thing to be grumpy about here, and one really big thing to worry about.

We're grumpy, of course, about the code duplication between these two cases. Let's identify a common abstraction and eliminate the code duplication:

rotStdin :: Int -> IO () rotStdin n = do input <- getContents let output = rot n input putStr output main :: IO () main = do args <- getArgs case args of [] -> rotStdin 13 [x] -> rotStdin (read x) _ -> ...

The thing to worry about is the wonderful world of user error. The code as written makes a call to read on unchecked user input. What if the user (maybe ourselves, in a few months), supplies an invalid argument? Let's test...

$ ./rot foo < rot.hs rot: Prelude.read: no parse $

Grandma is not going to be pleased. What's happened here is that read wasn't able to make sense of "foo" as the representation of an Int, so it gave up, and threw an exception. That exception wasn't caught by our code, but instead by the runtime system, which simply printed a user error, and terminated the program. In a sense, we're going to have to do much the same, but maybe we can print a more informative error. The question is how. We have two plausible approaches: (1) catch the exception that read throws ourselves, or (2) preflight the argument, making sure that it's in a form that read can correctly handle. It turns out that catching the exception is a bit complicated (we don't actually cover exceptions in this course...), so we'll preflight. An initial attempt might be:

import System.Exit ... [x] | all isDigit x -> rotStdin (read x) | otherwise -> do progname <- getProgName hPutStrLn stderr $ "usage: " ++ progname ++ " [n]" exitWith $ ExitFailure 255

Let's step through this, before we rip it apart. The all :: (a -> Bool) -> [a] -> Bool function is defined in the Prelude, and it simply makes sure that every element of its argument list satisfies the argument predicate. So we're checking to make sure we have a sequence of digits. If the preflight passes, we'll do the read, otherwise, we'll print an error message and terminate program execution in the Unix standard way: printing a usage message that includes our program name, and terminating with an error code.

Looking ahead a bit, we'll see that the last case is going to look a lot like the first, so let's abstract out the usage action, and finish up main:

usage :: IO () usage = do progname <- getProgName hPutStrLn stderr $ "usage: " ++ progname ++ " [n]" exitWith $ ExitFailure 255 main :: IO () main = do args <- getArgs case args of [] -> rotStdin 13 [x] | all isDigit x -> rotStdin (read x) | otherwise -> usage _ -> usage

At this point, we're pretty close to being done, but there are a couple of issues, both related to the preflighting of read.

The first is that our preflighting isn't quite strong enough, although the problematic case is unlikely to arise by accident:

$ ./rot "" < rot.hs rot: Prelude.read: no parse $

Maybe you didn't see that one coming: an empty command line argument, as distinct from an omitted argument. We can deal with that by tightening up the test:

| x /= "" && all isDigit x -> rotStdin (read x)

The next is to notice that it would be really nice to be able to accept a leading minus sign, because then we could decode rot 4 with rot -4. We can do this naïvely, e.g.,

validateInt :: String -> Bool validateInt "" = False validateInt "-" = False validateInt (c:cs) = (isDigit c || c == '-') && all isDigit cs ... | validateInt x -> rotStdin (read x)

or, we make use of one of Haskell's regular expression libraries:

import Text.Regex.Posix ... main :: IO () main = do args <- getArgs case args of [] -> rotStdin 13 [x] | x =~ "^-?[0-9]+$" -> rotStdin (read x) | otherwise -> usage _ -> usage

Note that the =~ operator has a lot of potential return types, and here we're using it in a context where its return type is Bool, in which case, s =~ pat will be True when the String s matches the regular expression pat, i.e., contains a matching substring. In this case, the regular expression matches strings that consist of the beginning of string anchor (^), a optional minus sign (-?), followed by one or more digits ([0-9]+), followed by the end of string anchor ($). The presence of the anchors forces the match of the entire string, and not merely a substring thereof.

Finally, it might occur to us that writing IO actions that filter stdin via some function to stdout is a pretty common case, and perhaps there is an easier way. And of course, there is, interact :: (String -> String) -> IO ().

rotStdin :: Int -> IO () rotStdin n = interact (rot n)

whence

rotStdin = interact . rot

*Exercise 9.1 Complete the implementation of rot, compile, and run ./rot 3 < rot.hs. While you're at it, deal in a principled way with input texts that don't consist solely of ASCII characters by exiting with an error. Turn in your program, both in cleartext and in cyphertext.

The Vigenère cipher

The simple rot ciphers are good enough to hide messages from people who don't want to read them—which was kind of the point to rot-13—but they're trivial to crack as there are only 25 viable keys. A simple modification to the Caesar cipher is the Vigenère cipher, which uses as password to describe a sequence of rotations, and then enciphers a message by rotating through the sequence.

For example, if we encrypt the sentence "Haskell is fun!" using the password "stuart" we'd get "Ztmkved cs ymg!" Ymg indeed!

*Exercise 9.2 Extend the functionality of your rot program by implementing the Vigenère cipher. You should use the Vigenère cipher if your program is passed an argument that consists of a password, i.e., a non-empty sequence of lower-case letters. You should enable decryption through the use of an optional minus prefix on the password. Aim for elegance, which includes avoiding code duplication.