Lecture 7: Records and List Comprehensions

To wrap up "Part 1" of the course, we will take a look at three more features in Haskell that often come in handy: records, newtype, and list comprehensions.

Records

Data constructors (and tuples) can have many component values. Consider the following:


    data Person
      = Student String String String String Int [(String, (Int, Int))]
      | Teacher String String String [(Int, Int)]

The intention of Student is to carry a first name, last name, identification string, major (i.e. home department), College year, and list of enrolled courses. A course comprises a department, course number, and section. The intention of Teacher is to carry a first name, last name, home department, and list of courses (within the department) currently being taught.

A couple aspects of this datatype definition are not ideal. First, it is easy to confuse which components of the data values are meant to represent what. Second, we may have to write tedious pattern matching functions to extract components from Person values, such as the following:


    lastName :: Person -> String
    lastName (Student _ last _ _ _ _) = last
    lastName (Teacher _ last _ _)     = last

What are we to do? For the former concern, one option is to write comments to declare our intentions (we should be doing this anyway!). Another is to introduce type aliases to emphasize our intent, as follows:


    type FirstName = String
    type LastName = String
    type Id = String
    type Department = String
    type Year = Int
    type CourseNum = (Int, Int)

    data Person
      = Student FirstName LastName Id Department Year [(Department, CourseNum)]
      | Teacher FirstName LastName Department [CourseNum]

But this doesn't actually prevent us from mixing up, for example, FirstNames and LastNames because they are just synonyms for String. Alternatively, we could introduce wrapper types so that the type system would prevent us from mixing up different types:


    data FirstName = FirstName String
    data LastName = LastName String
    data Id = Id String
    data Department = Department String
    data Year = Year Int
    data CourseNum = CourseNum (Int, Int)

Although this mitigates the first concern, it exacerbates the second because now even more patterns must be written to get at the data. This may not scalable in large programs with many datatypes and data representation needs to balance.

Fortunately, Haskell provides a feature that can be used to simultaneously address the two concerns above, namely, records. Consider the following type definition, where the components, or fields, of each Course value are named department, num, and section, respectively:


    data Course = Course { department :: String, num :: Int, section :: Int }
      deriving (Eq, Show)

Course values can be constructed with record syntax, where field names alleviate the need to remember the intended purpose of each positional component. Notice how the order of fields does not matter.

> let c1 = Course { department = "CMSC", num = 161, section = 1 } > let c2 = Course { num = 161, department = "CMSC", section = 1 } > c1 == c2 True

Course can also be used as an ordinary (non-record) data constructor.


    > :t Course
    Course :: String -> Int -> Int -> Course

    > let c3 = Course "CMSC" 161 1
    > c1 == c2 && c2 == c3
    True

Based on the record declaration, Haskell automatically generates functions to project, or unwrap, the fields of Course values:


    > :t department
    department :: Course -> String

    > :t num
    num :: Course -> Int

    > :t section
    section :: Course -> Int

    > department c1
    "CMSC"

    > (num c1, section c1)
    (161, 1)

In addition, one can use ordinary data constructor patterns, where components are matched by position:


    numAndSection (Course _ num section) = (num, section)

Alternatively, record patterns allow field names to be used:


    numAndSection (Course { department = _, num = i, section = j }) = (i, j)

As with record construction, the order of fields in record patterns does not matter. Furthermore, fields can be omitted if their values are not needed within the scope of the pattern:


    numAndSection (Course { section = j, num = i }) = (i, j)

Records can be defined (or not) for multiple data constructors of a type. For example:


    data ABCD
      = A { foo :: String, bar :: Int }
      | B { foo :: String, baz :: () }
      | C Int
      | D

This definition exhibits several noteworthy features. First, a field name can be used for components of different data constructors within a type. Second, not all data constructors need to be defined with record syntax.

Exercise 7.1 Based on the definition of ABCD, what are the types and behaviors of the functions foo, bar, and baz? Think about it, and then test them out.

Exercise 7.2 Should the following definition be acceptable? Think about it, and then try it out.


    data Data = One { data :: Int } | Two { data :: Bool }

Now, using records, we can define Person as follows:


    data Person
      = Student
          { firstName :: String
          , lastName :: String
          , id :: String
          , major :: String
          , year :: Int
          , courses_enrolled :: [(String, (Int, Int))]
          }
      | Teacher
          { firstName :: String
          , lastName :: String
          , dept :: String
          , courses_teaching :: [(Int, Int)]
          }

`newtype`

Data constructors tag, or label, the values they carry in order to distinguish them from values created with different data constructors for the same type. As we have seen, it is sometimes useful to define a new datatype even with only one data constructor. In such cases, tagging and untagging (or constructing and destructing, or boxing and unboxing) values is useful for enforcing invariants while programming, but these operations add unnecessary run-time overhead: there is only one kind of value, so they ought to be free of labels.

Haskell allows datatypes with exactly one unary constructor to be defined with the keyword newtype in place of data, such as


    newtype Identity a = Identity a

or, if we were using a record,


    newtype Identity a = Identity { runIdentity :: a }

The choice of Identity may see a bit odd, but it will make more sense later.

For the purposes of programming, using newtype is almost exactly the same as using data. But it tells the compiler to optimize the generated code by not including explicit Identity labels at run-time. We will get into the habit of using newtype whenever we define a datatype with one unary constructor, without delving into the subtle differences between using newtype and data.

As we meet more of the type classes that are central to Haskell's design, we will often create wrapper types (i.e. with one data constructor). Hence, we will use newtype. Furthermore, we will often write expressions of the form


    Identity . doSomething . runIdentity

to unwrap (runIdentity), transform (doSomething), and rewrap (Identity) values. Hence, we will use records so that unwrapping functions are generated automatically. At least, until we find a better way 😀.

One of the common uses of newtype is to provide alternative implementations of various type classes. E.g., if for some program-specific reason, we wanted to order pairs based on second-element first, we might consider


    module BackwardsPair where
    
    import Data.Ord
    
    newtype BackwardsPair a b = BackwardsPair (a,b)
        deriving Show
    
    instance (Eq a,Eq b) => Eq (BackwardsPair a b) where
        BackwardsPair p1 == BackwardsPair p2 = p1 == p2
    
    instance (Ord a, Ord b) => Ord (BackwardsPair a b) where
        compare = comparing (\(BackwardsPair (a,b)) -> (b,a))

The function comparing :: Ord a => (b -> a) -> b -> b -> Ordering is very useful for defining one ordering in terms of another. This is a slightly silly example, but we'll see more substantial examples soon.

List Comprehensions

Mathematicians have a concise notation for describing a set, which typically involves describing an initial set, a predicate, and a form for elements of that set, e.g., $\{ x^2 \mid \hbox{$x \in \omega$ and $x$ is even}\}$, the squares of all even natural numbers. These are called set comprehensions. Haskell provides a similar notation for lists:


    > [x^2 | x <- [1..10], even x]
    [4,16,36,64,100]

It is possible in list comprehensions to have multiple generators, let bindings, and to interleave generations, tests, and bindings. As a simple example, let's generate all of the pythagorean triples that where the hypotenuse is less than a given number:


    pythagoreanTriples n = 
        [ (a,b,c) 
        | a <- [1..n]
        , b <- [a+1..n] 
        , c <- [b+1..n]
        , a^2 + b^2 == c^2
        ]


    > pythagoreanTriples 20
    [(3,4,5),(5,12,13),(6,8,10),(8,15,17),(9,12,15),(12,16,20)]

Note that the results are sorted by a, because the generation runs like this, for each a from 1 to n, generate b from a+1 to n, and for each such a and b, generate c from b+1 to n...

But notice that this list contains a few non-primitive triples, e.g., (6,8,10), (9,12,15), and (12,16,20), which are all multiples of (3,4,5). Suppose we wanted to restrict ourselves to primitive triples, i.e., tuples that are relatively prime to one onother. How might we do that? A simple approach would be to filter out those triples where a and b have a non-trivial common divisor, i.e.,


    primitiveTriples n =
        [ (a,b,c)
        | a <- [1..n]
        , b <- [a+1..n]
        , gcd a b == 1
        , c <- [b+1..n]
        , a^2 + b^2 == c^2
        ]

It is more efficient to do the gcd test before the generation of c, because otherwise we'd have to repeat the same test (on a and b alone) for each c; but it should be noted too that our basic computational plan emphases clarity over efficiency, and there are much more efficient ways to generate lists of pythagorean triples.

*Exercise 7.3 The file Records.hs defines two Teachers, professorChugh and professorKurtz, and a "database" of Students, allStudents.

Implement the function


    studentsOfTeacher_ :: [Person] -> Person -> [((Int, Int), [(String, String)])]
    studentsOfTeacher_ students teacher = undefined

to return those students, identified by lastname-firstname pairs, enrolled in each of the teacher's courses. Your implementation can assume that students is a list of Student values and that teacher is a Teacher value.

For example:


    > studentsOfTeacher professorChugh
    [((16100,1),[("Student","B"),("Student","STEAM")])]

    > studentsOfTeacher professorKurtz
    [((16100,2),[("Student","C")]),((28000,1),[("Student","D"),("Student","E")])]

Once you are done, consider how one might eliminate the assumption above (but don't submit this).

CMSC-16100

Honors Introduction to Programming, I

Autumn Quarter, 2019

Lecture 7: Records and List Comprehensions

Records

`newtype`

List Comprehensions