Note: Rot-13
Our next program is a simple filter, not in the Haskell sense, but in the Unix sense of a program that reads from standard input, and writes to standard output.
In the days before Facebook and blogs, there was a distributed netnews system that consisted of a large collection of newsgroups. The most popular of these newsgroups, was “rec.humor.funny.” This was a moderated news group, which sent out a few jokes every day, most of which were indeed funny. But humor is a tricky thing, and some humor can cause offense. So the moderator adopted a simple strategy. Jokes that were potentially offensive were rot-13'd, i.e., the characters were remapped so that 'a' -> 'n', 'b' -> 'o', etc. People who wanted to read possibly offensive jokes would then rot-13 it again (since there are 26 letters, rotating twice by 13 takes you back to where you started). What made this work was that people were clearly warned that rot-13'ing was at your own risk. The social contract was that if you rot-13'ed, you surrendered the right to complain.
Let's start by thinking about this as a problem in pure code. How do we rot-13 a character? For the sake of simplicity, we'll assume that the underlying text is drawn from the ASCII subset of Unicode.
import Data.Char
rotChar :: Char -> Char
rotChar c
| isLower c = chr (ord 'a' + (ord c - ord 'a' + 13) `mod` 26)
| isUpper c = chr (ord 'A' + (ord c - ord 'A' + 13) `mod` 26)
| otherwise = c
This code uses the functions chr :: Int -> Char
and ord :: Char -> Int
from Data.Char
, which translate between Unicode characters and their associated Int
code points, and relies on the fact that the each case of letters is associated with an ascending, sequential set of code points. This is true of ASCII, but is not true of all character encodings. One minor mystery in the code has to do with the relative precedences of +
and `mod`
. Intuitively, `mod`
is related to division, and so might be assumed to have the same precedence as *
, /
, etc. This is indeed so, and can be checked using the :i
(info) command in ghci
:
Prelude> :i mod
class (Real a, Enum a) => Integral a where
...
mod :: a -> a -> a
...
-- Defined in GHC.Real
infixl 7 mod
Prelude> :i (*)
class Num a where
...
(*) :: a -> a -> a
...
-- Defined in `GHC.Num'
infixl 7 *
Prelude>
Here we see that mod
, when used as an infix operator, associates to the left and at precedence level 7, which is exactly the same as *
and its friends.
There are programmers (and sometimes I'm one) who would complain about the duplication of code in rotChar
. We can easily eliminate the duplication by introducing a helper function, which is built out of the duplicated code, and abstracts out the part that varies:
rotChar :: Char -> Char
rotChar c
| isLower c = rotCase 'a' c
| isUpper c = rotCase 'A' c
| otherwise = c
rotCase :: Char -> Char -> Char
rotCase base char = chr (ord base + (ord char - ord base + 13) `mod` 26)
But our problem wasn't to rotate individual characters, it was to rotate the entire input String. We need to lift the function
rotChar :: Char -> Char
to the function
rot :: String -> String
This is ridiculously easy, remembering that String
is a synonym for [Char]
.
rot :: String -> String
rot = map rotChar
Indeed, we'll soon enough come to understand map
in exactly these terms—as a particular instance of a generalized "lifting" function.
At this point, it makes sense to revisit our use of local definitions. The function that we care about is rot
—the functions rotChar
and rotCase
are simply there to help us define rot
. It makes sense to tidy our namespace up a bit, and encapsulate the definitions of these helper functions within the definition of rot
:
rot :: String -> String
rot = map rotChar where
rotChar c
| isLower c = rotCase 'a' c
| isUpper c = rotCase 'A' c
| otherwise = c
rotCase base char = chr (ord base + (ord char - ord base + 13) `mod` 26)
Our full program is now
module Main where
import Data.Char
rot :: String -> String
rot = map rotChar where
rotChar c
| isLower c = rotCase 'a' c
| isUpper c = rotCase 'A' c
| otherwise = c
rotCase base c = chr (ord base + (ord c - ord base + 13) `mod` 26)
main :: IO ()
main = do
input <- getContents
putStr $ rot input
The function getContents :: IO String
is an IO action that packages the program's standard input stream as a Haskell String
. The function putStr
writes a String
to standard output, but without a terminating newline. In this case, the desired terminating newline would have been present in the input stream, and would have survived our mapping, so there's no need to add another.
We can now test this, and what better input source than our source?!
$ ./rot < rot.hs
zbqhyr Znva jurer
vzcbeg Qngn.Pune
ebg :: Fgevat -> Fgevat
ebg = znc ebgPune jurer
ebgPune p
| vfYbjre p = ebgPnfr 'n' p
| vfHccre p = ebgPnfr 'N' p
| bgurejvfr = p
ebgPnfr onfr p = pue (beq onfr + (beq p - beq onfr + 13) `zbq` 26)
znva :: VB ()
znva = qb
vachg <- trgPbagragf
chgFge $ ebg vachg
Hopefully, this makes less sense to you than the cleartext version. Anyway, if we rot twice, we get back to where we started.
Let's play with this just a bit more. Suppose we want to implement other ciphers. For example, the Caesar cipher is rot 3, and would be decoded by rot 23 (or rot -3!).
To do this in a uniform way, we'll design our program so that if it is called without any command-line arguments, it does a standard rot-13, but if a single command-line argument is provided, it will be interpreted as an integer that gives the desired rotation. This exemplifies the design pattern of making the common case (rot-13) simple.
To do this, we'll use a case
to pattern match on the result of performing getArgs
:
module Main where
import ...
rot :: Int -> String -> String
rot n = ...
main :: IO ()
main = do
args <- getArgs
case args of
[] -> ...
[x] -> ...
_ -> ...
If there are no arguments, we want to write read standard input as a string, rotate it by the default of 13, and write the rotated string to standard output:
[] -> do
input <- getContents
let output = rot 13 input
putStr output
Let's step through this. Since we're doing IO, we know that the expressions in the do
construct must be IO actions, so the alternatives of the case
must also be IO actions. We'll usually need to combine several distinct IO actions (reading from standard input, writing to standard output) into a single IO action, hence the inner do
's. We read standard input by the IO action getContents :: IO String
, which when performed, returns the contents of standard input as a String
. Next, we see the use of a let
to bind the result of a pure computation. Finally, we write the resulting output string to standard out using putStr :: String -> IO ()
.
On to the second case... . A first cut at this might look like this:
[x] -> do
input <- getContents
let output = rot (read x) input
putStr output
There's one thing to be grumpy about here, and one really big thing to worry about.
We're grumpy, of course, about the code duplication between these two cases. Let's identify a common abstraction and eliminate the code duplication:
rotStdin :: Int -> IO ()
rotStdin n = do
input <- getContents
let output = rot n input
putStr output
main :: IO ()
main = do
args <- getArgs
case args of
[] -> rotStdin 13
[x] -> rotStdin (read x)
_ -> ...
The thing to worry about is the wonderful world of user error. The code as written makes a call to read
on unchecked user input. What if the user (maybe ourselves, in a few months), supplies an invalid argument? Let's test...
$ ./rot foo < rot.hs
rot: Prelude.read: no parse
$
Grandma is not going to be pleased. What's happened here is that read
wasn't able to make sense of "foo"
as the representation of an Int
, so it gave up, and threw an exception. That exception wasn't caught by our code, but instead by the runtime system, which simply printed a user error, and terminated the program. In a sense, we're going to have to do much the same, but maybe we can print a more informative error. The question is how. We have two plausible approaches: (1) catch the exception that read
throws ourselves, or (2) preflight the argument, making sure that it's in a form that read
can correctly handle. It turns out that catching the exception is a bit complicated (we don't actually cover exceptions in this course...), so we'll preflight. An initial attempt might be:
import System.Exit
...
[x]
| all isDigit x -> rotStdin (read x)
| otherwise -> do
progname <- getProgName
hPutStrLn stderr $ "usage: " ++ progname ++ " [n]"
exitWith $ ExitFailure 255
Let's step through this, before we rip it apart. The all :: (a -> Bool) -> [a] -> Bool
function is defined in the Prelude
, and it simply makes sure that every element of its argument list satisfies the argument predicate. So we're checking to make sure we have a sequence of digits. If the preflight passes, we'll do the read
, otherwise, we'll print an error message and terminate program execution in the Unix standard way: printing a usage message that includes our program name, and terminating with an error code.
Looking ahead a bit, we'll see that the last case is going to look a lot like the first, so let's abstract out the usage
action, and finish up main
:
usage :: IO ()
usage = do
progname <- getProgName
hPutStrLn stderr $ "usage: " ++ progname ++ " [n]"
exitWith $ ExitFailure 255
main :: IO ()
main = do
args <- getArgs
case args of
[] -> rotStdin 13
[x]
| all isDigit x -> rotStdin (read x)
| otherwise -> usage
_ -> usage
At this point, we're pretty close to being done, but there are a couple of issues, both related to the preflighting of read
.
The first is that our preflighting isn't quite strong enough, although the problematic case is unlikely to arise by accident:
$ ./rot "" < rot.hs
rot: Prelude.read: no parse
$
Maybe you didn't see that one coming: an empty command line argument, as distinct from an omitted argument. We can deal with that by tightening up the test:
| x /= "" && all isDigit x -> rotStdin (read x)
The next is to notice that it would be really nice to be able to accept a leading minus sign, because then we could decode rot 4
with rot -4
. We can do this naïvely, e.g.,
validateInt :: String -> Bool
validateInt "" = False
validateInt "-" = False
validateInt (c:cs) = (isDigit c || c == '-') && all isDigit cs
...
| validateInt x -> rotStdin (read x)
or, we make use of one of Haskell's regular expression libraries:
import Text.Regex.Posix
...
main :: IO ()
main = do
args <- getArgs
case args of
[] -> rotStdin 13
[x]
| x =~ "^-?[0-9]+$" -> rotStdin (read x)
| otherwise -> usage
_ -> usage
Note that the =~
operator has a lot of potential return types, and here we're using it in a context where its return type is Bool
, in which case, s =~ pat
will be True
when the String s
matches the regular expression pat
, i.e., contains a matching substring. In this case, the regular expression matches strings that consist of the beginning of string anchor (^
), a optional minus sign (-?
), followed by one or more digits ([0-9]+
), followed by the end of string anchor ($
). The presence of the anchors forces the match of the entire string, and not merely a substring thereof.
Finally, it might occur to us that writing IO actions that filter stdin via some function to stdout is a pretty common case, and perhaps there is an easier way. And of course, there is, interact :: (String -> String) -> IO ()
.
rotStdin :: Int -> IO ()
rotStdin n = interact (rot n)
whence
rotStdin = interact . rot
Exercise 1 Complete the implementation of rot
, compile, and run ./rot 3 < rot.hs
. While you're at it, deal in a principled way with input texts that don't consist solely of ASCII characters by exiting with an error. Turn in your program, both in cleartext and in cyphertext.
The Vigenère cipher
The simple rot ciphers are good enough to hide messages from people who don't want to read them—which was kind of the point to rot-13—but they're trivial to crack as there are only 25 viable keys. A simple modification to the Caesar cipher is the Vigenère cipher, which uses as password to describe a sequence of rotations, and then enciphers a message by rotating through the sequence.
For example, if we encrypt the sentence "Haskell is fun!" using the password "stuart" we'd get "Ztmkved cs ymg!" Ymg indeed!
Exercise 2 Extend the functionality of your rot
program by implementing the Vigenère cipher. You should use the Vigenère cipher if your program is passed an argument that consists of a password, i.e., a non-empty sequence of lower-case letters. You should enable decryption through the use of an optional minus prefix on the password. Aim for elegance, which includes avoiding code duplication.