Lecture 10: Records and List Comprehensions
To wrap up "Part 1" of the course, we will take a look at two more features in Haskell that often come in handy: records and list comprehensions.
Records
Data constructors (and tuples) can have many component values. Consider the following:
data Person
= Student String String String String Int [(String, (Int, Int))]
| Teacher String String String [(Int, Int)]
The intention of Student
is to carry a first name, last
name, identification string, major (i.e. home department), College
year, and list of enrolled courses. A course comprises a
department, course number, and section. The intention of
Teacher
is to carry a first name, last name, home
department, and list of courses (within the department) currently
being taught.
A couple aspects of this datatype definition are not ideal.
First, it is easy to confuse which components of the data values are
meant to represent what. Second, we may have to write tedious pattern matching
functions to extract components from Person
values, such
as the following:
lastName :: Person -> String
lastName (Student _ last _ _ _ _) = last
lastName (Teacher _ last _ _) = last
What are we to do? For the former concern, one option is to write comments to declare our intentions (we should be doing this anyway!). Another is to introduce type aliases to emphasize our intent, as follows:
type FirstName = String
type LastName = String
type Id = String
type Department = String
type Year = Int
type CourseNum = (Int, Int)
data Person
= Student FirstName LastName Id Department Year [(Department, CourseNum)]
| Teacher FirstName LastName Department [CourseNum]
But this doesn't actually prevent us from mixing up, for example,
FirstName
s and LastName
s because they are
just synonyms for String
.
Alternatively, we could introduce wrapper types so
that the type system would prevent us from mixing up different types:
data FirstName = FirstName String
data LastName = LastName String
data Id = Id String
data Department = Department String
data Year = Year Int
data CourseNum = CourseNum (Int, Int)
Although this mitigates the first concern, it exacerbates the second because now even more patterns must be written to get at the data. This may not scalable in large programs with many datatypes and data representation needs to balance.
Fortunately, Haskell provides a feature that can be used to simultaneously
address the two concerns above, namely, records.
Consider the following type definition, where
the components, or fields, of each Course
value
are named department
, num
, and
section
, respectively:
data Course = Course { department :: String, num :: Int, section :: Int }
deriving (Eq, Show)
Course
values can be constructed with record syntax,
where field names alleviate the need to remember the intended purpose
of each positional component. Notice how the order of fields does not matter.
> let c1 = Course { department = "CMSC", num = 161, section = 1 }
> let c2 = Course { num = 161, department = "CMSC", section = 1 }
> c1 == c2
True
Course
values can also be used as ordinary (non-record)
data constructors.
> :t Course
Course :: String -> Int -> Int -> Course
> let c3 = Course "CMSC" 161 1
> c1 == c2 && c2 == c3
True
Based on the record declaration, Haskell automatically generates
functions to project, or unwrap, the fields of
Course
values:
> :t department
department :: Course -> String
> :t num
num :: Course -> Int
> :t section
section :: Course -> Int
> department c1
"CMSC"
> (num c1, section c1)
(161, 1)
In addition, one can use ordinary data constructor patterns, where components are matched by position:
numAndSection (Course _ num section) = (num, section)
Alternatively, record patterns allow field names to be used:
numAndSection (Course { department = _, num = i, section = j }) = (i, j)
As with record construction, the order of fields in record patterns does not matter. Furthermore, fields can be omitted if their values are not needed within the scope of the pattern:
numAndSection (Course { section = j, num = i }) = (i, j)
Records can be defined (or not) for multiple data constructors of a type. For example:
data ABCD
= A { foo :: String, bar :: Int }
| B { foo :: String, baz :: () }
| C Int
| D
This definition exhibits several noteworthy features. First, a field name can be used for components of different data constructors within a type. Second, not all data constructors need to be defined with record syntax.
Exercise 10.1
Based on the definition of ABCD
,
what are the types and behaviors of the functions foo
,
bar
, and baz
?
Think about it, and then test them out.
Exercise 10.2 Should the following definition be acceptable? Think about it, and then try it out.
data Data = One { data :: Int } | Two { data :: Bool }
Now, using records, we can define Person
as follows:
data Person
= Student
{ firstName :: String
, lastName :: String
, id :: String
, major :: String
, year :: Int
, courses_enrolled :: [(String, (Int, Int))]
}
| Teacher
{ firstName :: String
, lastName :: String
, dept :: String
, courses_teaching :: [(Int, Int)]
}
newtype
Data constructors tag, or label, the values they carry in order to distinguish them from values created with different data constructors for the same type. As we have seen, it is sometimes useful to define a new datatype even with only one data constructor. In such cases, tagging and untagging (or constructing and destructing, or boxing and unboxing) values is useful for enforcing invariants while programming, but these operations add unnecessary run-time overhead: there is only one kind of value, so they ought to be free of labels.
Haskell allows datatypes with exactly one constructor to be defined
with the keyword newtype
in place of data
,
such as
newtype Box a = Box a
or, if we were using a record,
newtype Box a = Box { unbox :: a }
For the purposes of programming, using newtype
is
almost exactly the same as using data
. But it tells
the compiler to optimize the generated code by not including explicit
Box
labels at run-time.
We will get into the habit of using newtype
whenever we
define a datatype with one constructor, without delving into the
subtle differences
between using newtype
and data
.
As we meet more of the type classes that are central to Haskell's
design, we will often create wrapper types (i.e. with one data constructor).
Hence, we will use newtype
. Furthermore, we will often write
expressions of the form
Box . doSomething . unbox
to unwrap (unbox
), transform (doSomething
), and
rewrap (Box
) values. Hence, we will use records so that
unwrapping functions are generated automatically.
List Comprehensions
Mathematicians have a concise notation for describing a set, which typically involves describing an initial set, a predicate, and a form for elements of that set, e.g., $\{ x^2 \mid \hbox{$x \in \omega$ and $x$ is even}\}$, the squares of all even natural numbers. These are called set comprehensions. Haskell provides a similar notation for lists:
> [x^2 | x <- [1..10], even x]
[4,16,36,64,100]
It is possible in list comprehensions to have multiple generators, let
bindings, and to interleave generations, tests, and bindings. As a simple example, let's generate all of the pythagorean triples that where the hypotenuse is less than a given number:
pythagoreanTriples n =
[ (a,b,c)
| a <- [1..n]
, b <- [a+1..n]
, c <- [b+1..n]
, a^2 + b^2 == c^2
]
> pythagoreanTriples 20
[(3,4,5),(5,12,13),(6,8,10),(8,15,17),(9,12,15),(12,16,20)]
Note that the results are sorted by a
, because the generation runs like this, for each a
from 1
to n
, generate b
from a+1
to n
, and for each such a
and b
, generate c
from b+1
to n
...
But notice that this list contains a few non-primitive triples, e.g., (6,8,10)
, (9,12,15)
, and (12,16,20)
, which are all multiples of (3,4,5)
. Suppose we wanted to restrict ourselves to primitive triples, i.e., tuples that are relatively prime to one onother. How might we do that? A simple approach would be to filter out those triples where a
and b
have a non-trivial common divisor, i.e.,
primitiveTriples n =
[ (a,b,c)
| a <- [1..n]
, b <- [a+1..n]
, gcd a b == 1
, c <- [b+1..n]
, a^2 + b^2 == c^2
]
It is more efficient to do the gcd
test before the generation of c
, because otherwise we'd have to repeat the same test (on a
and b
alone) for each c
; but it should be noted too that our basic computational plan emphases clarity over efficiency, and there are much more efficient ways to generate lists of pythagorean triples.
*Exercise 10.3
The file Records.hs
defines two Teacher
s, professorChugh
and
professorKurtz
, and a "database" of Student
s,
allStudents
.
Implement the function
studentsOfTeacher_ :: [Person] -> Person -> [((Int, Int), [(String, String)])]
studentsOfTeacher_ students teacher = undefined
to return those students
, identified by
lastname-firstname pairs, enrolled in each of the
teacher
's courses. Your implementation can assume that
students
is a list of Student
values and that
teacher
is a Teacher
value.
For example:
> studentsOfTeacher professorChugh
[((16100,1),[("Student","B"),("Student","STEAM")])]
> studentsOfTeacher professorKurtz
[((16100,2),[("Student","C")]),((28000,1),[("Student","D"),("Student","E")])]
Once you are done, consider how one might eliminate the assumption above (but don't submit this).