# User-Defined Types

We've seen how lists are built with `[]` and `(:)`, and deconstructing with patterns using `[]` and `(:)`. Now we'll see a general way for users to define new types that have one or more kinds of values.

As an example, currency converter. For simplicity, only two kinds of currency.

``````data Currency
= USD Float  -- US Dollars
| JPY Float  -- Japanese Yen
deriving (Eq, Show)``````

The `data` keyword defines a new algebraic data type (ADT), or simply datatype, in this case called `Currency`. Values of type `Currency` are either the data constructor called `USD` and a single `Float` or the data constructor `JPY` and a single `Float`. The smallest denomination of US currency, one cent, is one-hundredth of a dollar; the smallest denomination of Japanese currency is one yen. We choose to track `JPY` with `Float`s to provide finer-grained precision; these will be rounded to `Integer` values later.

The `deriving (Eq, Show)` clause instructs Haskell to automatically implement a function called `show :: Currency -> String` which is described by the `Show` type class, and an equality function `(==) :: Currency -> Currency -> Bool` which described by the `Eq` type class. We'll talk more about these later. Try the interactions below with and without the `deriving` clause.

``````> USD 11.99
> JPY 1353
> USD 11.99 == USD 11.99
> USD 11.99 == JPY 1353``````

To extract data wrapped by data constructors, we use patterns involving the data constructors and variables. For example, the following function allows adding two values in either currency.

``````yenPerDollar  = 112.86         -- as of 9/27/17
dollarsPerYen = 1 / yenPerDollar

add                    :: Currency -> Currency -> Currency
add (USD d1) (USD d2)  =  USD (d1 + d2)
add (JPY y1) (JPY y2)  =  JPY (y1 + y2)
add (USD d)  (JPY y)   =  USD (d + y * dollarsPerYen)
add (JPY y)  (USD d)   =  JPY (y + d * yenPerDollar)``````

Note that any amount has two valid representations; the last two equations choose to retain the base currency of the first argument.

### Data Constructors

In general, a datatype definition contains one or more data constructors, beginning with capital letters (`A`, `B`, `C`, `D`) below. The name of the new type (`T`) below must also start with a capital letter.

``````data T
= A
| B Int
| C Bool String
| D (Bool, String)
deriving (Eq, Show)``````

Each data constructor wraps zero or more values:

• `A` does not carry any additional data,
• `B` carries a single `Int`,
• `C` carries one `Bool` and one `String`, and
• `D` carries one pair containing a `Bool` and an `Int`.

Data constructors are functions that, when applied to arguments of the specified types, produce values of the datatype (in this case, `T`).

``````> :t A
> :t B
> :t C
> :t D``````

### Pattern Matching

Three kinds of patterns can be used to match and destruct values of a datatype:

1. variables (starting with lowercase letters);
2. the wildcard pattern `_`; and
3. the data constructors of a datatype, along with an appropriate number of (sub)patterns for the values it carries.

For example:

``````foo             :: T -> Int
foo A           =  0
foo (B i)       =  1
foo (C b s)     =  2
foo (D (b, s))  =  3``````

At run-time, a variable pattern `x` (of some type `S`) successfully matches any value `v` (of type `S`); `v` is then bound to `x` in the right-hand side of the equation.

The wildcard pattern `_`, like a variable, matches any value but does not introduce a name for it.

A data constructor pattern `DataCon p1 ... pn` (of some type `S`) matches only those values `v` (of type `S`) that are constructed with `DataCon` and whose n values match the patterns `p1` through `pn`, respectively. If `DataCon` carries no data, then there are no subsequent patterns (i.e. `n`= 0).

Even though a datatype constructor may take multiple arguments (e.g. `C` in our example) and, thus, may be partially applied like other functions, all data constructors must be "fully applied" to enough patterns according to its definition.

### Case Expressions

The definition of `foo` above defines multiple equations — one for each data constructor — to handle all possible argument values of type `T`. Another way to destruct values of type `T` is to use a `case` expression.

``````foo :: T -> Int
foo t =
case t of
A        -> 0
B i      -> 1
C b s    -> 2
D (b, s) -> 3``````

Each branch of `case` expression must return the same type of value (in this case, `Int`). The equations in the original version of `foo` above are syntactic sugar for this version using `case`.

A `data` definition can refer to refer to one or more (lowercase) type variables (`a`, `b`, etc. below).

``data T a b c ... = ...``

The type variables can be referred to in the types of the data constructors.

#### Handling Errors

As an example to motivate polymorphic ADTs, recall our `head` and `tail` functions from before.

``````head :: [a] -> a

tail :: [a] -> [a]
tail (x:xs) = xs``````

By omitting equations for the empty list, these functions implicitly fail: they crash with inexhaustive pattern match errors at run-time.

Versions with more explicit failure:

``````head :: [a] -> a

tail :: [a] -> [a]
tail [] = undefined
tail (x:xs) = xs

> :t undefined``````

Whoa! `undefined` has every type? This is only okay because `undefined` throws an exception at run-time, halting program execution at that point.

This isn't much better than before. Using `undefined` is, however, a cool trick in the type system that allows exceptions to be raised anywhere in the code, while still accurately typechecking other expressions. Using `undefined` is often helpful during program development, allowing you to put "placeholders" for expressions that you haven't finished yet while still allowing the rest of the program to type check and run.

Now, a better way than crashing to represent failure.

``````data Maybe a
= Nothing
| Just a
deriving (Eq, Show)``````

Notice that `Maybe` is polymorphic; it takes one type variable `a`. A value of type `Maybe a` is either `Nothing` with no additional data, or `Just` with a single value of type `a`.

``````> :t Just True
> :t Just 1
> :t Just (+)

> :t Nothing
> :t Nothing :: Maybe Bool
> :t Nothing :: Maybe (Maybe (Maybe Bool))``````

We can use `Maybe` values to deal with "errors" much more effectively.

``````maybeHead :: [a] -> Maybe a

maybeTail :: [a] -> Maybe [a]
maybeTail []     = Nothing
maybeTail (x:xs) = Just xs``````

This allows (requires) callers of these functions to handle each of the two cases explicitly.

The `Maybe` type is extremely useful and is defined in the standard `Prelude`. Its cousin `Either` is also quite useful.

Datatypes can also be recursive.

``````data List a
= Nil
| Cons a (List a)
deriving (Eq, Show)``````

What does this type remind you of?

### Lists

The built-in lists are just like the `List` type above, but with special syntax:

• `[]` and `(:)`, rather than names starting with capital latters, are data constructors,
• the expression `[1,2,3,4]` is syntactic sugar for nested calls to `(:)`,
• the type is called `[]`, rather than a name starting with a capital letter, and
• the type `[a]` is syntactic sugar for `[] a`.

For example:

``````> :t []
> :t [] :: [a]
> :t [] :: [] a

> :t "abc"
> :t "abc" :: [Char]
> :t "abc" :: [] Char

> let (x:xs) = [1, 2, 3]
> let ((:) x xs) = [1, 2, 3]``````

### Tuples

``````data Pair a b = Pair a b deriving (Eq, Show)
data Triple a b c = Triple a b c deriving (Eq, Show)
data Tuple4 a b c d = Tuple4 a b c d deriving (Eq, Show)``````

Built-in pairs are just like these datatypes, again with special syntactic support.

``````> :t ('a', 2 :: Int)
> :t (,) 'a' 2 :: (,) Char Int
> :t (,)
> :t (,,)
> :t (,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,)

> let (a, b, c) = (1, 2, 3)
> let ((,,) a b c) = (1, 2, 3)``````

### Unit

Sometimes it is useful to have a dummy value.

``data Unit = Unit deriving (Eq, Show)``

The built-in unit value and type are just like this datatype, again with special syntactic support.

``````> :t ()
> :t () :: ()``````

### Booleans

Even booleans can be (and are) defined as a datatype.

``````data Bool
= False
| True
deriving (Eq, Show)``````

If-expressions, multi-way if-expressions, and guards are syntactic sugar for pattern matching on `Bool`s.

``````absoluteValue n
| n >= 0    = n
| otherwise = -n

-- source file                {-# LANGUAGE MultiWayIf #-}
-- ghc command-line option    -XMultiWayIf
-- ghci                       :set -XMultiWayIf
absoluteValue n =
if | n >= 0    -> n
| otherwise -> -n

absoluteValue n =
if n >= 0 then n
else if otherwise then -n
else undefined

absoluteValue n =
case n >= 0 of
True  -> n
False ->
case otherwise of
True  -> -n
False -> undefined``````

## Type Aliases vs. Wrapper Types

Recall the currency example above. We were careful to use `d`s for dollar `Float`s and `y`s for yen `Float`s, but it's easy to accidentally, for example, add these two kinds of `Float`s, leading to a non-sensical, and probably hard-to-detect-and-debug, result.

One approach might be to define type aliases.

``````type Dollars = Float
type Yen = Float``````

But remember, type aliases do not create new types, just synonyms. So, `Dollars` and `Yen`, which are synonymous with `Float` and with each other, can still be misused.

Another option is to define wrapper (or "box") types with single data constructors, for the purpose of forcing the programmer to construct and destruct the different `Float` types, with the usual support from the typechecker to prevent errors.

``````data Currency
= USD Dollars -- US Dollars
| JPY Yen     -- Japanese Yen
deriving (Eq, Show)

data Dollars = DollarsFloat Float deriving (Eq, Show)
data Yen     = YenFloat Float deriving (Eq, Show)``````

Note, it is common to see definitions like the following:

``````data Dollars = Dollars Float deriving (Eq, Show)
data Yen     = Yen Float deriving (Eq, Show)``````

Notice how the type and data constructor names are the same; this is not a problem, because types and expressions "live" in different places in the language syntax, so there's never any ambiguity.

Now, we redefine `add` as follows, in terms of several helper functions that significantly reduce the chances that we mistreat the actual `Float` values.

``````add                    :: Currency -> Currency -> Currency

addDollars :: Dollars -> Dollars -> Dollars
addDollars (Dollars d1) (Dollars d2) = Dollars (d1 + d2)

addYen :: Yen -> Yen -> Yen
addYen (Yen y1) (Yen y2) = Yen (y1 + y2)

convertYenToDollars :: Yen -> Dollars
convertYenToDollars (Yen y) = Dollars (y * dollarsPerYen)

convertDollarsToYen :: Dollars -> Yen
convertDollarsToYen (Dollars d) = Yen (d * yenPerDollar)``````

### newtype

Data constructors tag, or label, the values they carry in order to distinguish them from values created with different data constructors for the same type. As we have seen, it is sometimes useful to define a new datatype even with only one data constructor. In such cases, tagging and untagging (or constructing and destructing, or boxing and unboxing) values is useful for enforcing invariants while programming, but these operations add unnecessary run-time overhead: there is only one kind of value, so they ought to be free of labels.

Haskell allows datatypes with exactly one constructor (such as `Dollars` and `Yen`), and which carries exactly one value, to be defined with the keyword `newtype` in place of `data`, such as

``newtype Box a = Box a``

For the purposes of programming, `newtype` is almost exactly the same as `data`. But it tells the compiler to optimize the generated code by not including explicit `Box` labels at run-time. We will get into the habit of using `newtype` whenever we define a datatype with one constructor, without delving into the subtle differences between using `newtype` and `data`.

## Record Types

When data constructors carry multiple values, it can be hard to remember which components are meant to represent what (especially when multiple components have the same types). Furthermore, it can be tedious to write accessor functions that retrieve particular components carried by a data constructor.

Haskell offers an embellishment of data constructors that addresses these two issues. Consider the following definition where the data constructors `A` and `B` use record syntax to give names to its data values.

``````data ABCD
= A { foo :: String, bar :: Int }
| B { foo :: String, baz :: () }
| C Int
| D``````

All data constructors can be used to construct values as before.

``````> :t A
> :t B
> :t C
> :t D``````

But now, record construction can also be used to create `A` and `B` values.

``````> A { foo = "hello", bar = 17 }
> B { foo = "hello", baz = () }
> B { baz = (), foo = "hello" }``````

In addition, accessor functions have been automatically generated.

``````> :t foo
> :t bar
> :t baz``````

Furthermore, there is a way to create a new record by copying all of its fields except for some.

``````> let a = A { foo = "hello"; bar 17 }
> a { foo = "goodbye" }``````

Should the following be an acceptable `data` definition?

``data Data = One { data :: Int } | Two { data :: Bool }``

### Wrapper Types (Redux)

Soon we will encounter many situations where we need to define wrapper types to plug into other features of Haskell. Hence, we will use `newtype`. Furthermore, we will often write expressions of the form

``Box . doSomething . unbox``

to unwrap (`unbox`), transform (`doSomething`), and rewrap (`Box`) values. Hence, we will use records so that unwrapping functions are generated automatically.

``newtype Box a = Box { unbox :: a }``