Data Types =========== What is a type? Type = set of values + set of operations on these values that possess certain properties Topics to be covered: Data types supported in modern programming languages simple and compound types Type declaration Type inference and type checking Type equivalence, type compatibility, type conversion and coercion Strongly-typed, Weakly-typed and untyped languages Static Vs Dynamic type checking SIMPLE TYPES ------------- Simple Types Predefined: int, float, double, etc in C int, bool, real, etc. in SML All other types are constructed, starting from predefined (aka primitive) types Enumerated : "enum colors {red, green, blue};" in C "datatype colors = red | green | blue" in SML Note: datatype is a keyword used in SML to introduce new types COMPOUND TYPES -------------- Types contruted from other types using TYPE CONSTRUCTORS 1. Cartesian Product: --------------------- Let I represent the integer type (equivalently, the set of integers) and R represent reals. Then the cross product I x R, defined in the usual manner of product of sets, represents the cartesian product type obtained from I and R. i.e., I x R = {(i, r) | i in I, r in R} Products types correspond to "tuples" in SML. They are not supported in typical imperative languages, except with labels (see below). The above product type is denoted int * real in SML. Example (SML): - val v = (2,3.0); val v = (2,3.0) : int * real - type mytype = int * real Note: type is a keyword used to introduce new names (abbreviations) for types that are already known to SML. Datatype is for introducing new types unknown to SML. Note: Just as the cartesian product operator on sets is nonassociative, the `*' type constructor is not associative. Example: - val t = (2,3,4.0); val t = (2,3,4.0) : int * int * real - val s = ((2,3), 4.0); val s = ((2,3),4.0) : (int * int) * real - val u = (2, (3,4.0)); val u = (2,(3,4.0)) : int * (int * real) - t = s; stdIn:21.1-21.6 Error: operator and operand don't agree [equality type required] operator domain: ''Z * ''Z operand: (int * int * real) * ((int * int) * real) in expression: t = s NOTE: compiler complains that the types of arguments to equality operator must be the same, but it is not so in this case. You will get the same message if you try to compare s = u or t = u. NOTE: The equality operator has the type t * t -> bool for any type t. Since we say "for all values of t," t denotes a variable in the mathematical sense. It is called a type variable. SML uses a naming convention by which type variable names start with "'" or "''" and consist of the usual charaters used in identifiers. Elements of a tuple can be extracted using the # operator: - #2(u); val it = (3,4.0) : int * real - #3(t); val it = 4.0 : real - #3(u); stdIn:24.1-24.6 Error: operator and operand don't agree [record labels] operator domain: {3:'Y; 'Z} operand: int * (int * real) in expression: (fn {3=3,...} => 3) u NOTE: the error message says that u does not have a third component. - #2(#2(u)); /* second component of the second component of u */ val it = 4.0 : real Labelled products: ------------------ In pure cartesian products, the components of a tupe do not have names. Instead, they are identified by numbers. In LABELED PRODUCTs, each component of a tuple is given a name. Labeled products are also called RECORDS, which is a language-neutral term. "Struct" is a term that is specific to C and C++. "struct t { int a; float b; char * c;};" in C "type t = { a:int, b:real, c:string};" in SML In SML, components of a labled tuple value v can be accessed using the syntax #(v), just as we accessed the component of an unlabeled tuple using #(v) - val v = {a=3, b=1.0, c="six"}; val v = {a=3,b=1.0,c="six"} : {a:int, b:real, c:string} - #a(v); val it = 3 : int 2. Function Types: ------------------- "->" is the function type constructor T1 -> T2 is a function type; it corresponds to a function that takes arguments of type T1 and returns a value of type T2 Standard ML supports functions as first class values. They can be created and manipulated by other functions. In imperative languages such as C/C++, we can pass pointers to functions, but this does not offer the same level of flexibility. For example, there is no way for a C-function to dynamically create and return a pointer to a function; rather, it can return a pointer to an EXISTING function. For this reason, our discussion of function types will be focussed mainly on SML. Examples in SML: ---------------- - fun f x = x*x; val f = fn : int -> int - fun g x y = x*(y:real); val g = fn : real -> real -> real NOTE: g is different from h given below. g takes two arguments, which can be supplied one at a time (see example two lines below). h takes only one argument, which is itself a tuple consisting of two components. - fun h (x, y) = x * (y:real); val h = fn : real * real -> real - val v = g 3.0; val v = fn : real -> real NOTE: when g is given one argument, it returns a new function value. The type of g is real -> real -> real. Since the '->' operator is right-associative, we read the type as real -> (real -> real). In other words, g, when given an argument of type real, returns a value of type (real -> real). - val u = v 2.0; val u = 6.0 : real NOTE: when a real argument is given to v, it consumes it and produces an output value of type real. v is called a "closure" -- it represents a function for which some of the arguments have been provided, but its evaluation cannot proceed unless additional arguments are provided. The closure "remembers" the arguments supplied so far, so that when the remaining arguements are provided at a later time, evaluation can proceed. 3. Union types -------------- Union types correspond to set unions, just like product types corresponded to cartesian products. Unions can be tagged or untagged. C/C++ support only untagged unions: union v { int ival; float fval; char cval; }; In untagged unions, there is no way to ensure that the component of the right type is always accessed. For instance, an integer value may be stored in the above union. Due to a programming error, the fval field may be accessed at a later time. Clearly, the fval field does not contain a valid value at this point, so you will get some garbage. With tagged unions, the compiler can perform checks at runtime to ensure that the right components are accessed. Tagged unions are NOT supported in C/C++. Other languages such as Pascal support tagged unions using VARIANT RECORDs RECORD CASE b: BOOLEAN OF TRUE: i: INTEGER; | FALSE: r: REAL END END Tagged union is also called a "discriminated union," while untagged union is called an "undiscriminated union." Tagged unions are supported in SML using datatype declarations. - datatype tt = realval of real | intval of int ; datatype tt = intval of int | realval of real - val v = realval (2.0) ; val v = realval 2.0 : tt - val u = intval (3) ; val u = intval 3 : tt - fun add (realval(x), realval(y)) = realval (x+y) | add (intval(x), intval(y)) = intval (x+y) ;= stdIn:42.1-43.52 Warning: match nonexhaustive (realval x,realval y) => ... (intval x,intval y) => ... val add = fn : tt * tt -> tt - add (u, v); uncaught exception nonexhaustive match failure raised at: stdIn:43.40 - val w = intval(3); val w = intval 3 : tt - add(u,w); val it = intval 6 : tt NOTE: we can redefine add as follows so as to permit addition of reals and ints. We add two new rules, and in each rule, convert the integer type to real type explicitly. Note that the warning about "nonexhaustive match" is now gone. - fun add (realval(x), realval(y)) = realval (x+y) | add (intval(x), intval(y)) = intval (x+y)= = | add (intval(x), realval(y)) = realval(real(x)+y) = | add (realval(x), intval(y)) = realval(x+real(y)); val add = fn : tt * tt -> tt 4. Array types --------------- Array construction is denoted by array(, ). Thus, a C-declaration int a[5]; defines a variable a of type "array(0-4, int)" A declaration union tt b[6][7]; declares a variable b of type "array(0-4, array(0-6, union tt))" Sometimes we may not consider the range as part of an array variable's type. This corresponds to treating "array" as type constructor that takes just a single argument, namely, the element type. 5. Pointer types ------------------ A pointer type will be denoted using the syntax ptr() where denote the types of the object pointed by a pointer type. The C-declaration char *s; defines a variable s of type ptr(char), whereas a declaration int (*f)(int s, float v) defines a (function) pointer of type ptr(int*float -> int)