Lecture 5: A Bit of Tartness

As we have seen, Haskell allows the list syntax [1, 2, 3] to stand for the more verbose 1 : (2 : (3 : [])). That is, the former is syntactic sugar for the latter, more tart primitive version.

Exercise 5.1 What other kinds of syntactic sugar have we seen?

The standard syntax that we've been using for top-level function definitions also includes a considerable amount of syntactic sugar, which we've been exploiting without remark. Desugaring these definitions exposes a couple of additional features of the language that are very useful in other contexts.

Definition vs. Naming

Consider the definition of a simple function that is defined at the top-level of a Haskell program, e.g.,

square x = x * x

This seems simple enough, but there is sugar here. We're actually doing two separate things here: (a) we're defining a function, and (b) we're giving the function we've defined a name. We can separate them by introducing a lambda-expression into our definition:

square :: Num n => n -> n square = \x -> x * x

A lambda-expression is syntax for abstraction, which turns a base expression (here x * x) into a unary function by abstracting out one of the variables (here x) via the lambda (here \x -> ). Note that Haskell uses a backslash rather than the Greek letter λ (lambda), which itself was used as a convenient typographic alternative to a glyph that Alonzo Church invented for the purpose when he first laid down λ-calculus in the 1930s as a mathematical foundation for reasoning about higher-order functions.

One of the common uses of a lambda-expression is to create an anonymous function, e.g.,

> map (\x -> x * x) [1..4] [1,4,9,16]

We don't have to do this as often in Haskell as we do in other functional languages, because Haskell provides convenient facilities for the point-free definition of functions (e.g., sections, partially-evaluated functions, compositions), but it's good to have lambda-expressions for quick hitters. In particular, abstractions that contain multiple occurrences of the bound variable are often difficult to express without introducing a lambda, e.g., consider,

> map (\x -> x * 2^x) [1..4] [2,8,24,64]

The scope of the arrow -> in the lambda notation extends as far as possible, and associates to the right, which is convenient because it allows us to deal with nested lambdas without having to add lots of parentheses, e.g.,

hypotenuse = \x -> \y -> sqrt(x^2 + y^2)

Haskell has a nice bit of syntactic sugar, which allows us to collapse the nested lambda into a single form:

hypotenuse = \x y -> sqrt $ x^2 + y^2

The mathematical foundations of functional languages (like Lisp) rest on the ideas of application and abstraction. Typed functional languages (like ML and Haskell) rest equally on the notion of type, as we've already seen. Application in Haskell is denoted through juxtaposition, which is about the lightest syntactic mechanism possible. Abstraction has a somewhat more cumbersome notation, i.e., the lambda-forms we've just looked at.

While we're here, let's note that the λ-calculus's η-equivalence rule is (\x -> M x) ≣ M whenever x has no free occurrences in M. The "cancellation" approach to η-reduction that we've used thus far is a special case of the η-equivalence rule, but it's worthwhile to understand the general form as it permits simplification of anonymous functions (λ-forms), often to the point of eliminating the explicit λ in favor of compositions, sections, etc.

Cases

It is often useful to define a function by cases, i.e., have different defining equations that apply in different circumstances. We've seen two different kinds of circumstance so far, and two different syntactic devices for dealing with them.

The first circumstance represents arguments with different structures, e.g.,

length [] = 0 length (_:as) = 1 + length as

In this case, the length function has different defining equations, according to whether its argument is an empty list, or a cons-cell. In this style of definition, when we evaluate the function, we consider each equation in turn. If the actual arguments to the function match the patterns of an equation, we use the corresponding definition, if not, we move on to the next equation.

Haskell's case construct enables us to pattern match for code at the expression level, e.g.,

length = \l -> case l of [] -> 0 (_:as) -> 1 + length as

The tokens case, of, and -> are the keywords of this construct. the ->, much like its use an lambda, separates a pattern from the value the of the case statement if the value of the expression it is called on matches the corresponding pattern. Any number of pattern-expression pairs can be present.

One practical programming concern associated with case constructs is that of coverage: does every possible value of the case expression actually match one of the patterns? If a value is encountered during run time that doesn't match any of the patterns, an exception will be thrown.

Another kind of definition by cases arises because different values might have different defining equations, e.g.,

abs x | x >= 0 = x | otherwise = -x

In the most common binary case like this, we can use an if...then...else... construct:

abs = \x -> if x >= 0 then x else -x

although this has a different, heavier, syntactic "feel," which becomes a bit problematic if we have more than two alternatives, cf., the signum function (from the Num type class we will see later), which might be implemented like this:

signum x | x < 0 = -1 | x == 0 = 0 | x > 0 = 1

Rendering this using if...then...else... requires nesting the if...then...else. Note here that Haskell '10 uses a slightly different syntax from Haskell '98, in that then and else can occur at the same level of indentation as the corresponding if.

signum x = if x < 0 then -1 else if x == 0 then 0 else 1

Note that we've lost something here, in that the final test (x > 0) is necessarily elided, as we can't have an if...then expression in Haskell without a corresponding else, and in that the traffic in keywords seems to dominate actual content. GHC's MultiWayIf extension allows us an extension of the syntax for if that (nearly) parallels definition using guards:

{-# LANGUAGE MultiWayIf #-} module Stuff where signum x = if | x < 0 -> -1 | x == 0 -> 0 | x > 0 -> 1

GHC offers a lot of extensions, but as a general rule, we'd caution in favor of adopting them conservatively, in that each extension takes you farther away from the Haskell standards, and so makes your code less portable, and a bit more opaque to people encountering it for the first time. Invoking a language extension can be done in a couple of different ways, either by command-line flags (e.g., -XMultiWayIf), flags in a cabal file, or source-code annotations, like that above. We prefer the latter.

One surprise here is that the MultiWayIf syntax uses -> rather than = as the separator between guard and value. This is because guards are actually a part of the case syntax, i.e., we can define the value associated with a pattern-match via guard-value pairs, much as in top level definitions.

By way of illustration, let's consider a bit of code associated with a bank that has two kinds of checking accounts. Basic accounts provide bare-bones checking, with a \$10.00/month service fee, and fancy accounts provide free checks, and the monthly fee depends on the account balance, \$20.00 for balances under \$10,000, and free otherwise.

To that end, we'll devise a simple Account type:

data AccountType = Basic | Fancy type Money = Integer type AccountId = String data Account = Account AccountType Money

There are a couple things to notice here. The first is our use of type declarations to clarify the roles of various values. The second is our choice of Integer as the underlying type of Money, representing an amount in pennies, as we've seen before.

Our simple banking program will represent its accounts via an association list, i.e., a list of key-value pairs:

type Accounts = [(AccountId, Account)]

Association lists have a long history in functional programming as light-weight implementation of the dictionary abstraction. There are other implementations that perform better, especially on large dictionaries, but simple association lists are often good enough. There are a number of functions in the Prelude that simplify working with association lists, notably:

lookup :: AccountId -> Accounts -> MaybeAccount

where

data MaybeAccount = Nothing | Just Account

is a simple type which is used to package either zero or one Account value, and so is useful for dealing with situations where we want to return "no result." (If you look up the type of lookup, you will see that is actually rather more general than the one above. Stay tuned.)

Let's write a function monthlyFee to calculate the monthly fee, given an AccountId and an Accounts association list:

monthlyFee :: AccountId -> Accounts -> Integer

Our error handling strategy will be to raise an exception (by calling error) if there is no account with the given AccountID.

monthlyFee :: AccountId -> Accounts -> Integer monthlyFee accId accounts = case lookup accId accounts of Nothing -> error $ "unknown account: " ++ accId Just (Account Basic _) -> 10 * 100 Just (Account Fancy bal) | bal < 10000 * 100 -> 20 * 100 | otherwise -> 0

Notice here that patterns can be complex, and can specify more than top-level structure. Indeed, our use here goes three levels deep: Just, Account, and Fancy! The only real constraint on patterns is that any variables bound in pattern matching can only occur once in the pattern to be matched.

A continual concern with this sort of structure-based case analysis is coverage: have we accounted for all possible structures (at least once!) in our definitions? GHC can analyze our patterns, and will let us know if they're incomplete, but it doesn't do so at default levels. Using the command-line flag -Wall (warn: all) is an easy habit to get into, and you should.

*Exercise 5.2 The choice between structure-based and value-based case discrimination isn't fixed in stone. There's often more than one way to accomplish a given task. A common example is definition by trichotomous cases, as occurs in signum above. Here, there's a possible efficiency concern, in that we're doing three different ordering based tests. The compare function (defined in the Ord type class, which we will discuss later) enables us to determine the ordering relationship (LT, EQ, or GT) between two values in a single call, and we can use it to give an alternative definition of signum by doing a structure-based case on the result of a call to compare. Do so.

Local Definitions

Haskell's where clauses allow local bindings to be defined in a "top-down" style:

minutesPerDay = minutesPerHour * hoursPerDay where minutesPerHour = 60 hoursPerDay = 24

Alternatively, one can define local bindings in a "bottom-up" style:

minutesPerDay = let minutesPerHour = 60 hoursPerDay = 24 in minutesPerHour * hoursPerDay

The expression let x = e in e' binds e to x within e'. That is, the scope of x is e'; free occurrences of x within e' refer to the local binding.

The more general form let {x1 = e1; ...; xn = en} in e' allows multiple (mutually recursive) local bindings to be defined at once. Because the local bindings within minutesPerDay are not mutally recursive, we could instead write the following (arguably less elegant) nested local definitions:

minutesPerDay = let minutesPerHour = 60 in let hoursPerDay = 24 in minutesPerHour * hoursPerDay

Check out "Let vs. Where" to read more about the tradeoffs between local let bindings and where clauses.