Lecture 5: A Bit of Tartness
As we have seen, Haskell allows the list syntax [1, 2, 3]
to stand for
the more verbose 1 : (2 : (3 : []))
. That is, the former is
syntactic sugar for the latter, more tart primitive version.
Exercise 5.1 What other kinds of syntactic sugar have we seen?
The standard syntax that we've been using for top-level function definitions also includes a considerable amount of syntactic sugar, which we've been exploiting without remark. Desugaring these definitions exposes a couple of additional features of the language that are very useful in other contexts.
Definition vs. Naming
Consider the definition of a simple function that is defined at the top-level of a Haskell program, e.g.,
square x = x * x
This seems simple enough, but there is sugar here. We're actually doing two separate things here: (a) we're defining a function, and (b) we're giving the function we've defined a name. We can separate them by introducing a lambda-expression into our definition:
square :: Num n => n -> n
square = \x -> x * x
A lambda-expression is syntax for abstraction, which turns a base expression (here x * x
) into a unary function by abstracting out one of the variables (here x
) via the lambda (here \x ->
). Note that Haskell uses a backslash rather than the Greek letter λ (lambda), which itself was used as a convenient typographic alternative to a glyph that Alonzo Church invented for the purpose when he first laid down λ-calculus in the 1930s as a mathematical foundation for reasoning about higher-order functions.
One of the common uses of a lambda-expression is to create an anonymous function, e.g.,
> map (\x -> x * x) [1..4]
[1,4,9,16]
We don't have to do this as often in Haskell as we do in other functional languages, because Haskell provides convenient facilities for the point-free definition of functions (e.g., sections, partially-evaluated functions, compositions), but it's good to have lambda-expressions for quick hitters. In particular, abstractions that contain multiple occurrences of the bound variable are often difficult to express without introducing a lambda, e.g., consider,
> map (\x -> x * 2^x) [1..4]
[2,8,24,64]
The scope of the arrow ->
in the lambda notation
extends as far as possible, and associates to the right, which is convenient because it allows us to deal with nested lambdas without having to add lots of parentheses, e.g.,
hypotenuse = \x -> \y -> sqrt(x^2 + y^2)
Haskell has a nice bit of syntactic sugar, which allows us to collapse the nested lambda into a single form:
hypotenuse = \x y -> sqrt $ x^2 + y^2
The mathematical foundations of functional languages (like Lisp) rest on the ideas of application and abstraction. Typed functional languages (like ML and Haskell) rest equally on the notion of type, as we've already seen. Application in Haskell is denoted through juxtaposition, which is about the lightest syntactic mechanism possible. Abstraction has a somewhat more cumbersome notation, i.e., the lambda-forms we've just looked at.
While we're here, let's note that the λ-calculus's η-equivalence rule is (\x -> M x) ≣ M
whenever x
has no free occurrences in M
. The "cancellation" approach to η-reduction that we've used thus far is a special case of the η-equivalence rule, but it's worthwhile to understand the general form as it permits simplification of anonymous functions (λ-forms), often to the point of eliminating the explicit λ in favor of compositions, sections, etc.
Cases
It is often useful to define a function by cases, i.e., have different defining equations that apply in different circumstances. We've seen two different kinds of circumstance so far, and two different syntactic devices for dealing with them.
The first circumstance represents arguments with different structures, e.g.,
length [] = 0
length (_:as) = 1 + length as
In this case, the length
function has different defining equations, according to whether its argument is an empty list, or a cons-cell. In this style of definition, when we evaluate the function, we consider each equation in turn. If the actual arguments to the function match the patterns of an equation, we use the corresponding definition, if not, we move on to the next equation.
Haskell's case
construct enables us to pattern match for code at the expression level, e.g.,
length = \l -> case l of
[] -> 0
(_:as) -> 1 + length as
The tokens case
, of
, and ->
are the keywords of this construct. the ->
, much like its use an lambda, separates a pattern from the value the of the case statement if the value of the expression it is called on matches the corresponding pattern. Any number of pattern-expression pairs can be present.
One practical programming concern associated with case
constructs is that of coverage: does every possible value of the case expression actually match one of the patterns? If a value is encountered during run time that doesn't match any of the patterns, an exception will be thrown.
Another kind of definition by cases arises because different values might have different defining equations, e.g.,
abs x
| x >= 0 = x
| otherwise = -x
In the most common binary case like this, we can use an if...then...else...
construct:
abs = \x ->
if x >= 0 then x
else -x
although this has a different, heavier, syntactic "feel," which becomes a bit problematic if we have more than two alternatives, cf., the signum
function (from the Num
type class we will see later), which might be implemented like this:
signum x
| x < 0 = -1
| x == 0 = 0
| x > 0 = 1
Rendering this using if...then...else...
requires nesting the if...then...else
. Note here that Haskell '10 uses a slightly different syntax from Haskell '98, in that then
and else
can occur at the same level of indentation as the corresponding if
.
signum x = if x < 0 then -1
else if x == 0 then 0
else 1
Note that we've lost something here, in that the final test (x > 0
) is necessarily elided, as we can't have an if...then
expression in Haskell without a corresponding else
, and in that the traffic in keywords seems to dominate actual content. GHC's MultiWayIf
extension allows us an extension of the syntax for if
that (nearly) parallels definition using guards:
{-# LANGUAGE MultiWayIf #-}
module Stuff where
signum x = if | x < 0 -> -1
| x == 0 -> 0
| x > 0 -> 1
GHC offers a lot of extensions, but as a general rule, we'd caution in favor of adopting them conservatively, in that each extension takes you farther away from the Haskell standards, and so makes your code less portable, and a bit more opaque to people encountering it for the first time. Invoking a language extension can be done in a couple of different ways, either by command-line flags (e.g., -XMultiWayIf
), flags in a cabal
file, or source-code annotations, like that above. We prefer the latter.
One surprise here is that the MultiWayIf
syntax uses ->
rather than =
as the separator between guard and value. This is because guards are actually a part of the case
syntax, i.e., we can define the value associated with a pattern-match via guard-value pairs, much as in top level definitions.
By way of illustration, let's consider a bit of code associated with a bank that has two kinds of checking accounts. Basic accounts provide bare-bones checking, with a \$10.00/month service fee, and fancy accounts provide free checks, and the monthly fee depends on the account balance, \$20.00 for balances under \$10,000, and free otherwise.
To that end, we'll devise a simple Account
type:
data AccountType = Basic | Fancy
type Money = Integer
type AccountId = String
data Account = Account AccountType Money
There are a couple things to notice here. The first is our use of type
declarations to clarify the roles of various values. The second is our choice of Integer
as the underlying type of Money
, representing an amount in pennies, as we've seen before.
Our simple banking program will represent its accounts via an association list, i.e., a list of key-value pairs:
type Accounts = [(AccountId, Account)]
Association lists have a long history in functional programming as light-weight implementation of the dictionary abstraction. There are other implementations that perform better, especially on large dictionaries, but simple association lists are often good enough. There are a number of functions in the Prelude
that simplify working with association lists, notably:
lookup :: AccountId -> Accounts -> MaybeAccount
where
data MaybeAccount = Nothing | Just Account
is a simple type which is used to package either zero or one Account
value,
and so is useful for dealing with situations where we want to return "no result." (If you look up
the type of lookup
, you will see that is actually rather more general than the
one above. Stay tuned.)
Let's write a function monthlyFee
to calculate the monthly fee, given an AccountId
and an Accounts
association list:
monthlyFee :: AccountId -> Accounts -> Integer
Our error handling strategy will be to raise an exception (by calling error
) if there is no account with the given AccountID
.
monthlyFee :: AccountId -> Accounts -> Integer
monthlyFee accId accounts = case lookup accId accounts of
Nothing -> error $ "unknown account: " ++ accId
Just (Account Basic _) -> 10 * 100
Just (Account Fancy bal)
| bal < 10000 * 100 -> 20 * 100
| otherwise -> 0
Notice here that patterns can be complex, and can specify more than top-level structure. Indeed, our use here goes three levels deep: Just
, Account
, and Fancy
! The only real constraint on patterns is that any variables bound in pattern matching can only occur once in the pattern to be matched.
A continual concern with this sort of structure-based case analysis is coverage: have we accounted for all possible structures (at least once!) in our definitions? GHC can analyze our patterns, and will let us know if they're incomplete, but it doesn't do so at default levels. Using the command-line flag -Wall
(warn: all) is an easy habit to get into, and you should.
*Exercise 5.2 The choice between structure-based and value-based case discrimination isn't fixed in stone. There's often more than one way to accomplish a given task. A common example is definition by trichotomous cases, as occurs in signum
above. Here, there's a possible efficiency concern, in that we're doing three different ordering based tests. The compare
function (defined in the Ord
type class, which we will discuss later) enables us to determine the ordering relationship (LT
, EQ
, or GT
) between two values in a single call, and we can use it to give an alternative definition of signum
by doing a structure-based case on the result of a call to compare
. Do so.
Local Definitions
Haskell's where
clauses allow local bindings to be
defined in a "top-down" style:
minutesPerDay = minutesPerHour * hoursPerDay
where minutesPerHour = 60
hoursPerDay = 24
Alternatively, one can define local bindings in a "bottom-up" style:
minutesPerDay =
let
minutesPerHour = 60
hoursPerDay = 24
in
minutesPerHour * hoursPerDay
The expression let x = e in e'
binds e
to
x
within e'
. That is, the scope of
x
is e'
; free occurrences of x
within e'
refer to the local binding.
The more general form let {x1 = e1; ...; xn = en} in e'
allows multiple (mutually recursive) local bindings to be defined at once.
Because the local bindings within minutesPerDay
are not mutally
recursive, we could instead write the following (arguably less elegant) nested
local definitions:
minutesPerDay =
let minutesPerHour = 60 in
let hoursPerDay = 24 in
minutesPerHour * hoursPerDay
Check out
"Let vs. Where"
to read more about the tradeoffs between local let
bindings
and where
clauses.