Type Systems
This section deals with more theoretical aspects of types.
A
type system
is a set of rules used by a language to structure and organize its collection of
types.
We use the term object (or data object) to denote both the storage and the
stored value.
The operations defined for a type are the only way to manipulate its
instantiated objects.
A
type error
is any attempt to manipulate objects with illegal operations.
A program is
type safe
(or type secure) if it is guaranteed to have no type errors.
Static Versus Dynamic Program Checking
( for type errors )
Let's consider errors in general before looking at type errors specifically.
- What kinds of errors may occur in programs?
- When may errors be checked for?
- How does error-checking time affect the quality of the resulting
programs?
There are two categories of errors
- Language errors - syntactic and semantic errors in the use of the PL
- Application errors - deviations in program behavior relative to the
program's specifications
- has to do with software design issues, which are outside the scope
of this book
There are two broad categories of
error checking based on when the errors are checked for
- dynamic error checking - requires the program to be executed on sample
input data
- static error checking - does not require program execution
- Preferable to dynamic checking
- potential errors are only detected at runtime if the input data
we provide causes the error
- dynamic checking slows down program execution
- Often called compile-time checking or translation-time checking
- a misnomer since with separate compilation some checks must be
made at link time
- Does not uncover all errors. Some are only manifest at run time
Strong Typing and Type Checking
The goal of a type system is to prevent the writing of
type-unsafe programs as much as possible.
- A type system is said to be strong,
if it guarantees type safety.
- A type system is said to be weak if it
is not strong.
- A strongly typed language is one with
a strong type system.
- A weakly typed language is one with a
weak type system.
In general, there different ways for a PL to achieve a strong type system.
- A statically typed language is a strongly typed language
- Example: A static type system can be achieved by requiring:
- Only built-in types can be used
- All variables are declared with an associated type
- All operations are specified by stating the types of the required
operands and the type of the result
- In some languages the binding between a variable and its type cannot be
determined
at compile-time, yet their type systems guarantee type safety
How should a designer choose a type system when designing a new PL?
- Two conflicting design specifications
- The size of the set of legal programs
- The efficiency of the type checking procedure in the compiler
- The type system restricts the set of programs that can be written
- A smaller set of programs Þ
simpler type checking
Type Compatibility
Consider a PL with an operation, OP, which expects an operand of type T
A strict type system might require that OP may only be legally invoked
with a parameter of type T
On the other hand, the PL might define conditions under which an operand of
type Q is also acceptable
without violating type safety.
In this case we say that "in the
context of operation OP, type Q is compatible
with type T".
Type compatibility is sometimes called type
conformance or type equivalence.
When this compatibility is precisely defined, the PL can still have a strong
type system.
Example: The program fragment in figure 3.8 is written in a hypothetical PL.
What are the effects of different sorts of type compatibility rules?
Name
Compatibility - A strict conformance rule in which a type name is
compatible only with itself
Under this rule:
- (2) is type correct
- (1), (3) and (4) contain type errors
Structural
Compatibility -
Two types are compatible if they have the same structure
Under this rule:
- (1),(2) and (3) are type correct
- (4) contains a type error
A few issues with structural compatibility as defined here.
- What happens with the field names of Cartesian Products?
- Ignored? Required to coincide? Required to occur in same order?
Name Compatibility is:
- easier to implement than structural compatibility
- much stronger than structural compatibility
Name compatibility is often
preferable because it prevents two types from being considered
compatible just because their representations happen to be identical.
Practical Issue:
Some PLs adopt the idea of type compatibility, but either poorly define the
rules
or leave it entirely up to the implementer.
This results in programs accepted by one compiler and rejected by another
compiler
Type Conversions
- Automatic conversions , called coercions
- Explicit conversions, called casts
Example of coercion
x = x + z; (in C )
Any coercions which may occur depend on context.
If z is float and x
is int:
- x is coerced to float to evaluate the
addition, which is a real addition
- the result is coerced to an int for the asignment
Explicit Type Conversion
An explicit conversion can be used in some PLs to avoid an undesirable
coercion.
For example, C has a cast
construct which can force a type conversion that otherwise
might not occcur.
Assuming the same variable types as above, a programmer could write:
x = x + (int) z;
- z is coerced to type int
semantically, z is assigned to an unamed
variable of
type int using the normal coercion rules
- the un-named variable is used to evaluate the addition, which is integer
Ada provides only explicit conversions, subject to rules defining allowed
conversions.
- If X is declared as FLOAT and I as INTEGER
I := INTEGER(X);
The conversion function, INTEGER(), provided by Ada, is
applied to X to give the nearest INTEGER.
Advantages of allowing coercions
- Desirable conversions are automatically done
Disadvantages of allowing coercions
-
These implicit conversions are 'behind the scenes'
Þ PL gets complicated
Þ Programs may
become obscure
-
Coercions weaken the usefulness of type checking by
overriding the declared types of objects
The interaction between coercions and overloading of operators and routines
makes programs
difficult to understand.
Types and Subtypes
Assume a type
T
is defined as a set of values with an associated set of operations.
A
subtype
STof
T
can be defined to be a subset of those values ( and, for simplicitiy, the same
operations )
*note - the discussion here is in the context of conventional PLs. We ignore the
ability to
specify user-defined operations for subtypes
If
ST
is a subtype of
T,
T
is also called ST's
supertype
(or
parent type)
If a PL supports subtypes, it must define:
- A way to define subsets of a given type
- Compatibility rules between a subtype and it's supertype
Example - Pascal
- introduced concept of subtype as a subrange of any discrete ordinal type
type natural = 0..maxint;
digit = 0..9;
small = -9..9;
- A Pascal program may only define a subset of contiguous
values
- e.g. a subtype of all even integers would not be allowwed
- Different subtypes of a given type are compatible among themselves and
the supertype,
but type-safe operations may cause run-rime errors.
- e.g. small is provided to an expression requiring a digit may cause error
Generic Types
Consider a generic abstract data type for a stack of elements of parameter
type T,
with operations having the following signatures:
push: stack(T) ´ T ®
stack(T)
pop: stack(T) ®
stack(T) ´ T
length: stack(T) ®int
The operations defined for type stack(T)
should work uniformly for any possible type T.
Since the type is not known, how can the routines be type-checked?
PLs like Ada, C++ and Eiffel support this by instantiating generic types and/or
routines at compile-time.
The generic type parameters are bound to concrete types, enabling type-checking.
- C++ only requires explicit instantiation of generic classes, not
routines
Monomorphic versus Polymorphic Type Systems
A statically typed language can provide a strong, simple type system in which
every program entity
has a specific type (defined by a declaration), and every operation requires
operands of exactly
the sort appearing in the operation definition.
A
monomorphic
type system is a type system in which every object belongs to one and only one
type,
as described above.
A
polymorphic
type system is a type system in which objects can belong to more than one type.
C, Pascal and Ada all deviate from strict monomorphism to some degree.
- compatibility
- coercion
- subtyping
- operator overloading
All practical PLs have some degree of polymorphism, so to differentiate
between them
we need to
differentiate among the
various levels and kinds of polymorphism.
The different facets of polymorphism can be classifies as shown in
figure 3.10.
Let's show how the classification scheme applies in the case of polymorphic
functions.
Polymorphic
functions are
those whose arguments and return values (domain and range)
can belong to more than one type.
Level 1 - universal vs, ad hoc polymorphism
- functions that are universally polymorphic work uniformly for an
infinite set of types
all of which have some common structure
- execute the same code for all admissible types
- An ad hoc polymorphic function is just a syntactic abbreviation for
small set of
different monomorphic functions.
Level 2 - universal ::
parametric vs inclusion
- parametric polymorphism is the most genuine form of universal
polymorphism.
- in this case the polymorphic function works uniformly on a range
of types
- an implicit or explicit type parameter determines the type of arguments
for each use
- generic routines as implemented by ML functions are an example of this
- generic routines as implemented in Ada and C++ are only an apparent kind
of polymorphism
- they can be viewed as ad hoc polymorphism since the routines are
instantiated at compile time
with full binding of parameters to specific types.
- an example of inclusion polymorphism is subtyping
- the function is applicable to a given type and any of its subtypes
- also applicable in the context of object oriented languages
- dynamic polymorphism is frequently used to classify the case where the
binding between
language entities and their form varies dynamically.
PLs which support this cannot have strong type systems
Level 2 - ad hoc :: overloading vs. coercion
- In overloading, the same function name can be used in different contexts
to denote different functions
- Example in C : arithmetic expression a + b
+ is an ad hoc polymorphic function whose behavior depends on its operand
types
- float operands machine instruction float+
- int operands machine instruction int+
The fact that + is overloaded is purely a syntactic phenomenon
- In coercion, the argument is converted to a type expected by the
function
- the polymorphism is only apparent
- provided statically by code inserted by the compiler
- provided dynamically by runtime tests on type descriptors
- Example in C : arithmetic expression a + b
+ is an ad hoc polymorphic function whose behavior depends on its operand
types as above
If the two operands are different types, the float+ operator is invoked
after coercing the int
operand to a real.
The Type Structure of Representative Languages
The type structure of a PL is an overall hierarchical classification of
the features provided for structuring data.
In order to completely understand the semantics of a PL, this description
must be complemented by a precise
understanding of the rules of the type system