Lecture 9

Rot-13

Our next program is a simple filter, not in the Haskell sense, but instead in the Unix sense of a program that reads from standard input, and writes to standard output.

In the days before Facebook and blogs, there was a distributed netnews system that consisted of a large collection of newsgroups. One of these newsgroups, perhaps the most popular, was “rec.humor.funny.” This was a moderated news group, which sent out a few jokes every day, most of which were indeed funny. But humor is a tricky thing, and some humor can cause offense. So the moderator adopted a simple strategy. Jokes that were potentially offensive were rot-13'd, i.e., the characters were remapped so that 'a' -> 'n', 'b' -> 'o', etc. People who wanted to read possibly offensive jokes would then rot-13 it again (since there are 26 letters, rotating twice by 13 takes you back to where you started). What made this work was that people were clearly warned that rot-13'ing was at your own risk. The social contract was that if you rot-13'ed, you surrendered the right to complain.

Let's start by thinking about this as a problem in pure code. How do we rot-13 a character?


    import Data.Char

    rotChar :: Char -> Char
    rotChar c
        | isLower c = chr (ord 'a' + (ord c - ord 'a' + 13) `mod` 26)
        | isUpper c = chr (ord 'A' + (ord c - ord 'A' + 13) `mod` 26)
        | otherwise = c

This code uses the functions chr :: Int -> Char and ord :: Char -> Int from Data.Char, which translate between Unicode characters and their associated Int code points, and relies on the fact that the each case of letters is associated with an ascending, sequential set of code points. This is true of Unicode, but is not true of all character encodings. One minor mystery in the code has to do with the relative precedences of + and `mod`. Intuitively, `mod` is related to division, and so might be assumed to have the same precedence as *, /, etc. This is indeed so, and can be checked using the :i (info) command in ghci:


    Prelude> :i mod
    class (Real a, Enum a) => Integral a where
      ...
      mod :: a -> a -> a
      ...
        -- Defined in GHC.Real
    infixl 7 mod
    Prelude> :i (*)
    class Num a where
      ...
      (*) :: a -> a -> a
      ...
        -- Defined in `GHC.Num'
    infixl 7 *
    Prelude>

Here we see that mod, when used as an infix operator, associates to the left and at precedence level 7, which is exactly the same as * and its friends.

There are programmers (and sometimes I'm one) who would complain about the duplication of code in rotChar. We can easily eliminate the duplication by introducing a helper function:


    rotChar :: Char -> Char
    rotChar c
        | isLower c = rotCase 'a' c
        | isUpper c = rotCase 'A' c
        | otherwise = c

    rotCase :: Char -> Char -> Char
    rotCase cas char = chr (ord cas + (ord char - ord cas + 13) `mod` 26)

Note that we can't actually use case as a variable name, because it's a keyword in the language, and you can't use keywords as variables.

But our problem wasn't to rotate individual characters, it was to rotate the entire input String. We need to “lift” the function


    rotChar :: Char -> Char

to the function


    rot :: String -> String

This is ridiculously easy, remembering that String is a synonym for [Char].


    rot :: String -> String
    rot = map rotChar

Indeed, we'll soon enough come to understand map in exactly these terms—as a function that "lifts" another function from a base type a to a derived type (in this case, [a]), and as such exemplifies a more general facility in the special case of lists.

At this point, it makes sense to revisit our use of local definitions. The function that we care about is rot—the functions rotChar and rotCase are simply there to help us define rot. It makes sense to tidy things up a bit, and encapsulate the definitions of these helper functions inside the definition of rot:


    rot :: String -> String
    rot = map rotChar where
        rotChar c
            | isLower c = rotCase 'a' c
            | isUpper c = rotCase 'A' c
            | otherwise = c
        rotCase cas char = chr (ord cas + (ord char - ord cas + 13) `mod` 26)

Our full program is now


    module Main where
    
    import Data.Char
    
    rot :: String -> String
    rot = map rotChar where
        rotChar c
            | isLower c = rotCase 'a' c
            | isUpper c = rotCase 'A' c
            | otherwise = c
        rotCase cas c = chr (ord cas + (ord c - ord cas + 13)  `mod` 26)
    
    main :: IO ()
    main = do
        input <- getContents
        putStr $ rot input

The function getContents :: IO String is an IO action that packages the program's standard input stream as a Haskell String. The function putStr writes a String to standard output, but without a terminating newline. In this case, the desired terminating newline would have been present in the input stream, and would have survived our mapping, so there's no need to add another.

We can now test this, and what better input source than our source?!


    $ ./rot < rot.hs
    zbqhyr Znva jurer
    
    vzcbeg Qngn.Pune
    
    ebg :: Fgevat -> Fgevat
    ebg = znc ebgPune jurer
        ebgPune p
            | vfYbjre p = ebgPnfr 'n' p
            | vfHccre p = ebgPnfr 'N' p
            | bgurejvfr = p
        ebgPnfr pnf p = pue (beq pnf + (beq p - beq pnf + 13)  `erz` 26)
    
    znva :: VB ()
    znva = qb
        vachg <- trgPbagragf
        chgFge $ ebg vachg
    $

Hopefully, this makes less sense to you than the cleartext version. Anyway, if we rot twice, we get back to where we started...


    $ cat .rot | ./rot | ./rot
    module Main where
    
    import Data.Char
    
    rot :: String -> String
    rot = map rotChar where
        rotChar c
            | isLower c = rotCase 'a' c
            | isUpper c = rotCase 'A' c
            | otherwise = c
        rotCase cas c = chr (ord cas + (ord c - ord cas + 13)  `mod` 26)
    
    main :: IO ()
    main = do
        input <- getContents
        putStr $ rot input
    $

Let's play with this just a bit more. Suppose we want to implement other ciphers. For example, the Caesar cipher is rot 3, and would be decoded by rot 23 (or rot -3!).

This involves two separate puzzles. The first is to redefine the rot function so that it takes an additional parameter, specifically, the amount of rotation to perform. The second is to capture that parameter value from the command line. Let's tackle the second first.

There is a function getArgs :: IO [String] in System.Environment that will return the list of command line arguments (unlike C's argv, not including the program name). Our code expects there to be either zero arguments (in which case rot 13 will be performed) or one argument (which is assumed to be a string representing a small integer in decimal format).

To distinguish between these alternatives, we'll use a case, which allows us pattern match within an expression:


    module Main where
    
    import ...
    
    rot :: Int -> String -> String
    rot n = ...
    
    main :: IO ()
    main = do
        args <- getArgs
        case args of
            [] -> ...
            [x] -> ...
            _ -> ...

If there are no arguments, we want to write read standard input as a string, rotate it by the default of 13, and write the rotated string to standard output:


            [] -> do
                input <- getContents
                let output = rot 13 input
                putStr output

Let's step through this. Since we're in the IO monad, we know that the expressions in the do construct must be IO actions, so the alternatives of the case must also be IO actions. We'll usually need to combine several distinct IO actions (reading from standard input, writing to standard output) into a single IO action, hence the inner do. We read standard input by the IO action getContents :: IO String, which when performed, does some IO, and returns the contents of standard input as a String. Next, we see the use of a let binding within a do construct. Note that there is no in keyword—the scope of the let is the remainder of the do body. Haskell beginners often stumble over when to use <- vs. let to bind a result, but it really is simple: to bind the result of performing a monadic action, use <-; to bind the result of a pure computation, use let. Finally, we write the resulting output string to standard out using putStr :: String -> IO ().

On to the second case... . A first cut at this might look like this:


            [x] -> do
                input <- getContents
                let output = rot (read x) input
                putStr output

Let's focus first on the wonderful and seemly mysterious read :: Read a => String -> a. For people who are used to traditional programming languages, read seems like a magic. But there's an important difference between Haskell's use of types and that of other languages: and that is that Haskell's type checker is a more advanced technology which can use the return type of a function to select an implementation from a family of functions defined via a typeclass. The point here is that the call to read occurs in the context of rot (read x) input, and rot :: Int -> String -> String, and therefore the implementation of read is chosen from the Int instance of Read.

There's one thing to be grumpy about here, and one really big thing to worry about.

We're grumpy, of course, about the code duplication between these two cases. Let's identify a common abstraction and eliminate the code duplication:


    rotStdin :: Int ->  IO ()
    rotStdin n = do
        input <- getContents
        let output = rot n input
        putStr output
    
    main :: IO ()
    main = do
        args <- getArgs
        case args of
            []  -> rotStdin 13
            [x] -> rotStdin (read x)
            _   -> ...

The thing to worry about is the wonderful world of user error. The code as written makes a call to read on unchecked user input. What if the user (maybe ourselves, in a few months), supplies an invalid argument? Let's test...


    $ ./rot foo < rot.hs
    rot: Prelude.read: no parse
    $

Grandma is not going to be pleased. What's happened here is that read wasn't able to make sense of "foo" as the representation of an Int, so it gave up, and threw an exception. That exception wasn't caught by our code, but instead by the runtime system, which simply printed a user error, and terminated the program. In a sense, we're going to have to do much the same, but maybe we can print a more informative error. The question is how. We have two plausible approaches: (1) catch the exception that read throws ourselves, or (2) preflight the argument, making sure that it's in a form that read can correctly handle. It turns out that catching the exception is a bit complicated (we don't actually cover exceptions in this course...), so we'll preflight. An initial attempt might be:


    import System.Exit
    ...
    
            [x] 
                | all isDigit x -> rotStdin (read x)
                | otherwise     -> do
                        progname <- getProgName
                        hPutStrLn stderr $ "usage: " ++ progname ++ " [n]"
                        exitWith $ ExitFailure 255

Let's step through this, before we rip it apart. The all :: (a -> Bool) -> [a] -> Bool function is defined in the Prelude, and it simply makes sure that every element of its argument list satisfies the argument predicate. So we're checking to make sure we have a sequence of digits. If the preflight passes, we'll do the read, otherwise, we'll print an error message and terminate program execution in the Unix standard way: printing a usage message that includes our program name, and terminating with an error code.

Looking ahead a bit, we'll see that the last case is going to look a lot like the first, so let's abstract out the usage action, and finish up main:


    usage :: IO ()
    usage = do
        progname <- getProgName
        hPutStrLn stderr $ "usage: " ++ progname ++ " [n]"
        exitWith $ ExitFailure 255
    
    main :: IO ()
    main = do
        args <- getArgs
        case args of
            []  -> rotStdin 13
            [x] 
                | all isDigit x -> rotStdin (read x)
                | otherwise     -> usage
            _   -> usage

At this point, we're pretty close to being done, but there are a couple of issues, both related to the preflighting of read.

The first is that our preflighting isn't quite strong enough, although the problematic case is unlikely to arise by accident:


    $ ./rot "" < rot.hs
    rot: Prelude.read: no parse
    $

Maybe you didn't see that one coming: an empty command line argument, as distinct from an omitted argument. We can deal with that by tightening up the test:


               | x /= "" && all isDigit x -> rotStdin (read x)

The next is to notice that it would be really nice to be able to accept a leading minus sign, because then we could decode rot 4 with rot -4. We can do this naïvely, e.g.,


    validateInt :: String -> Bool
    validateInt "" = False
    validateInt "-" = False
    validateInt (c:cs) = (isDigit c || c == '-') && all isDigit cs

    ...
    
                | validateInt x -> rotStdin (read x)

or, we make use of one of Haskell's regular expression libraries:


    import Text.Regex.Posix

    ...
    
    main :: IO ()
    main = do
        args <- getArgs
        case args of
            []  -> rotStdin 13
            [x]
                | x =~ "^-?[0-9]+$" -> rotStdin (read x)
                | otherwise         -> usage
            _   -> usage

Note that the =~ operator has a lot of potential return types, and here we're using it in a context where its return type is Bool, in which case, s =~ pat will be True when the String s matches the regular expression pat, i.e., contains a matching substring. In this case, the regular expression matches strings that consist of the beginning of string anchor (^), a optional minus sign (-?), followed by one or more digits ([0-9]+), followed by the end of string anchor ($). The presence of the anchors forces the match of the entire string, and not merely a substring thereof.

Finally, it might occur to us that writing IO actions that filter stdin via some function to stdout is a pretty common case, and perhaps there is an easier way. And of course, there is, interact :: (String -> String) -> IO ().


    rotStdin :: Int -> IO ()
    rotStdin n = interact (rot n)

whence


    rotStdin = interact . rot

*Exercise 9.1 Complete the implementation of rot, compile, and run ./rot 3 < rot.hs. Turn in your program, both in cleartext and in cyphertext.

The Vigenère cipher

The simple rot ciphers are good enough to hide messages from people who don't want to read them—which was kind of the point to rot-13—but they're trivial to crack as there are only 25 viable keys. A simple modification to the Caesar cipher is the Vigenère cipher, which uses as password to describe a sequence of rotations, and then enciphers a message by rotating through the sequence.

For example, if we encrypt the sentence "Haskell is fun!" using the password "stuart" we'd get "Ztmkved cs ymg!" Ymg indeed!

*Exercise 9.2 Extend the functionality of your rot program by implementing the Vigenère cipher. You should use the Vigenère cipher if your program is passed an argument that consists of a password, i.e., a non-empty sequence of lower-case letters. You should enable decryption through the use of an optional minus prefix on the password. Aim for elegance, which includes avoiding code duplication.

CMSC-16100

Honors Introduction to Programming, I

Autumn Quarter, 2014

Lecture 9

Rot-13

The Vigenère cipher