Lecture 10: Records and List Comprehensions

To wrap up "Part 1" of the course, we will take a look at two more features in Haskell that often come in handy: records and list comprehensions.

Records

Data constructors (and tuples) can have many component values. Consider the following:

data Person = Student String String String String Int [(String, (Int, Int))] | Teacher String String String [(Int, Int)]

The intention of Student is to carry a first name, last name, identification string, major (i.e. home department), College year, and list of enrolled courses. A course comprises a department, course number, and section. The intention of Teacher is to carry a first name, last name, home department, and list of courses (within the department) currently being taught.

A couple aspects of this datatype definition are not ideal. First, it is easy to confuse which components of the data values are meant to represent what. Second, we may have to write tedious pattern matching functions to extract components from Person values, such as the following:

lastName :: Person -> String lastName (Student _ last _ _ _ _) = last lastName (Teacher _ last _ _) = last

What are we to do? For the former concern, one option is to write comments to declare our intentions (we should be doing this anyway!). Another is to introduce type aliases to emphasize our intent, as follows:

type FirstName = String type LastName = String type Id = String type Department = String type Year = Int type CourseNum = (Int, Int) data Person = Student FirstName LastName Id Department Year [(Department, CourseNum)] | Teacher FirstName LastName Department [CourseNum]

But this doesn't actually prevent us from mixing up, for example, FirstNames and LastNames because they are just synonyms for String. Alternatively, we could introduce wrapper types so that the type system would prevent us from mixing up different types:

data FirstName = FirstName String data LastName = LastName String data Id = Id String data Department = Department String data Year = Year Int data CourseNum = CourseNum (Int, Int)

Although this mitigates the first concern, it exacerbates the second because now even more patterns must be written to get at the data. This may not scalable in large programs with many datatypes and data representation needs to balance.

Fortunately, Haskell provides a feature that can be used to simultaneously address the two concerns above, namely, records. Consider the following type definition, where the components, or fields, of each Course value are named department, num, and section, respectively:

data Course = Course { department :: String, num :: Int, section :: Int } deriving (Eq, Show)

Course values can be constructed with record syntax, where field names alleviate the need to remember the intended purpose of each positional component. Notice how the order of fields does not matter.

> let c1 = Course { department = "CMSC", num = 161, section = 1 } > let c2 = Course { num = 161, department = "CMSC", section = 1 } > c1 == c2 True

Course values can also be used as ordinary (non-record) data constructors.

> :t Course Course :: String -> Int -> Int -> Course > let c3 = Course "CMSC" 161 1 > c1 == c2 && c2 == c3 True

Based on the record declaration, Haskell automatically generates functions to project, or unwrap, the fields of Course values:

> :t department department :: Course -> String > :t num num :: Course -> Int > :t section section :: Course -> Int > department c1 "CMSC" > (num c1, section c1) (161, 1)

In addition, one can use ordinary data constructor patterns, where components are matched by position:

numAndSection (Course _ num section) = (num, section)

Alternatively, record patterns allow field names to be used:

numAndSection (Course { department = _, num = i, section = j }) = (i, j)

As with record construction, the order of fields in record patterns does not matter. Furthermore, fields can be omitted if their values are not needed within the scope of the pattern:

numAndSection (Course { section = j, num = i }) = (i, j)

Records can be defined (or not) for multiple data constructors of a type. For example:

data ABCD = A { foo :: String, bar :: Int } | B { foo :: String, baz :: () } | C Int | D

This definition exhibits several noteworthy features. First, a field name can be used for components of different data constructors within a type. Second, not all data constructors need to be defined with record syntax.

Exercise 10.1 Based on the definition of ABCD, what are the types and behaviors of the functions foo, bar, and baz? Think about it, and then test them out.

Exercise 10.2 Should the following definition be acceptable? Think about it, and then try it out.

data Data = One { data :: Int } | Two { data :: Bool }

Now, using records, we can define Person as follows:

data Person = Student { firstName :: String , lastName :: String , id :: String , major :: String , year :: Int , courses_enrolled :: [(String, (Int, Int))] } | Teacher { firstName :: String , lastName :: String , dept :: String , courses_teaching :: [(Int, Int)] }

newtype

Data constructors tag, or label, the values they carry in order to distinguish them from values created with different data constructors for the same type. As we have seen, it is sometimes useful to define a new datatype even with only one data constructor. In such cases, tagging and untagging (or constructing and destructing, or boxing and unboxing) values is useful for enforcing invariants while programming, but these operations add unnecessary run-time overhead: there is only one kind of value, so they ought to be free of labels.

Haskell allows datatypes with exactly one constructor to be defined with the keyword newtype in place of data, such as

newtype Box a = Box a

or, if we were using a record,

newtype Box a = Box { unbox :: a }

For the purposes of programming, using newtype is almost exactly the same as using data. But it tells the compiler to optimize the generated code by not including explicit Box labels at run-time. We will get into the habit of using newtype whenever we define a datatype with one constructor, without delving into the subtle differences between using newtype and data.

As we meet more of the type classes that are central to Haskell's design, we will often create wrapper types (i.e. with one data constructor). Hence, we will use newtype. Furthermore, we will often write expressions of the form

Box . doSomething . unbox

to unwrap (unbox), transform (doSomething), and rewrap (Box) values. Hence, we will use records so that unwrapping functions are generated automatically.


List Comprehensions

Mathematicians have a concise notation for describing a set, which typically involves describing an initial set, a predicate, and a form for elements of that set, e.g., $\{ x^2 \mid \hbox{$x \in \omega$ and $x$ is even}\}$, the squares of all even natural numbers. These are called set comprehensions. Haskell provides a similar notation for lists:

> [x^2 | x <- [1..10], even x] [4,16,36,64,100]

It is possible in list comprehensions to have multiple generators, let bindings, and to interleave generations, tests, and bindings. As a simple example, let's generate all of the pythagorean triples that where the hypotenuse is less than a given number:

pythagoreanTriples n = [ (a,b,c) | a <- [1..n] , b <- [a+1..n] , c <- [b+1..n] , a^2 + b^2 == c^2 ] > pythagoreanTriples 20 [(3,4,5),(5,12,13),(6,8,10),(8,15,17),(9,12,15),(12,16,20)]

Note that the results are sorted by a, because the generation runs like this, for each a from 1 to n, generate b from a+1 to n, and for each such a and b, generate c from b+1 to n...

But notice that this list contains a few non-primitive triples, e.g., (6,8,10), (9,12,15), and (12,16,20), which are all multiples of (3,4,5). Suppose we wanted to restrict ourselves to primitive triples, i.e., tuples that are relatively prime to one onother. How might we do that? A simple approach would be to filter out those triples where a and b have a non-trivial common divisor, i.e.,

primitiveTriples n = [ (a,b,c) | a <- [1..n] , b <- [a+1..n] , gcd a b == 1 , c <- [b+1..n] , a^2 + b^2 == c^2 ]

It is more efficient to do the gcd test before the generation of c, because otherwise we'd have to repeat the same test (on a and b alone) for each c; but it should be noted too that our basic computational plan emphases clarity over efficiency, and there are much more efficient ways to generate lists of pythagorean triples.

*Exercise 10.3 The file Records.hs defines two Teachers, professorChugh and professorKurtz, and a "database" of Students, allStudents.

Implement the function

studentsOfTeacher_ :: [Person] -> Person -> [((Int, Int), [(String, String)])] studentsOfTeacher_ students teacher = undefined

to return those students, identified by lastname-firstname pairs, enrolled in each of the teacher's courses. Your implementation can assume that students is a list of Student values and that teacher is a Teacher value.

For example:

> studentsOfTeacher professorChugh [((16100,1),[("Student","B"),("Student","STEAM")])] > studentsOfTeacher professorKurtz [((16100,2),[("Student","C")]),((28000,1),[("Student","D"),("Student","E")])]

Once you are done, consider how one might eliminate the assumption above (but don't submit this).