Topic 6 -Type Checking

A source program should follow both the syntactic and semantic rules of the source language. Some rules can be checked statically during compile time and other rules can only be checked dynamically during run time. Static checking includes the syntax checks performed by the parser and semantic checks such as type checks, flow-of-control checks, uniqueness checks, and name-related checks. Here we focus on type checking.

[6.1] - Type Systems

Basic types are atomic types that have no internal structure as far as the programmer is concerned. They include types like integer, real, boolean, and character. Subrange types like 1..10 in Pascal and enumerated types like (violet, indigo, blue, green, yellow, orange, red) are also basic types.

Constructed types include arrays, records, sets, and structures constructed from the basic types and/or other constructed types. Pointers and functions are also constructed types.

Type Expressions: A type expression denotes the type of a language construct, it is either a basic type or formed from other type expressions by applying an operator called a type constructor. Here we use type expressions formed from the following rules:

  1. A basic type is a type expression. Other basic type expressions are type-error to signal the presence of a type error and void to signal the absence of a value.

  2. If a type expression has a name then the name is also a type expression.

  3. A type constructor applied to type expressions is a type expression. Type constructors include:

  4. Section 6.6 of the text shows that type expressions may contain variables whose values are type expressions.
Click here for type expression format that you can use in Projects 2 and 3.

[6.2] - A Simple Type Checker

Most all programming languages insist that the type of an ID token be declared before it can be used. A type checker has two kinds of actions: (1) when processing declarations it stores the appropriate type expressions in the symbol table entries of ID tokens; and (2) when processing statements it checks that all ID tokens, constants, etc., are of the proper types. Here we describe a translation scheme for treating declarations in the project grammar.

The type expression for an array has three attributes:

For consistency, the type expression for a scalar also has three attributes but low and high are set to the NULL value. The translation scheme for the type and standard_type nonterminals is shown below (it uses the ChangeToArray function to change a scalar type to an array type and the ChkInt function to report an error if attributes does not point to an integer constant.)
type --> standard_type { type.typ := standard_type.typ ;
type.low := NULL ; type.high := NULL ; }
type --> ARRAYTOK LBRK { ChkInt() ; type.low := attributes ; } NUM
DOTDOT { ChkInt() ; type.high := attributes ; } NUM
RBRK OFTOK standard_type
{ type.typ := ChangeToArray(standard_type.typ) ; }
standard_type --> INTTOK { standard_type.typ := integer ; }
standard_type --> REALTOK { standard_type.typ := real ; }
standard_type --> BOOLTOK { standard_type.typ := boolean ; }

Declarations of Scalars and Arrays

A declaration of scalars or arrays uses the following productions:

declaration --> ID declaration_rest
declaration_rest --> COMMA ID declaration_rest | COLON type

For example, the parse tree for ID1, ID2 : real is:

 _____________
|             |
| declaration |
|_____________|
 |  \
 |   \__________________
ID1  |                  |
     | declaration_rest |
     |__________________|
     /    |     \
    /     |      \__________________
 COMMA   ID2     |                  |
                 | declaration_rest |
                 |__________________|
                 /           \______
                /            |      |
              COLON          | type |      
                             |______|
                              \_______________
                              |               |
                              | standard_type |
                              |_______________|
                     real ____/
The type node is at the bottom of a chain of declaration_rest nodes so it's a simple matter to move the synthesized attributes, typ, low, and high, up the chain and insert them into the symbol table entries of the ID tokens. InsertType is a function that inserts typ, low, and high, into the appropriate fields of a symbol table entry.

A subroutine in the source program may declare a local variable with the same name as a global variable so a new symbol table entry must be created for the local variable. The translation scheme calls a function, ChkScope, to create such a new entry whenever it is needed. ChkScope checks the scope field of the ID-entry that attributes points to:

  1. If the scope field of the entry equals CurrentScope then the entry was newly created by the lexical analyzer. The lexeme of the entry was never seen before so there is no conflict with any global variable and ChkScope simply returns a pointer to that entry.

  2. If the scope field of the entry doesn't equal CurrentScope then the entry is really for a previously-declared global variable. To prevent a conflict with the global variable, ChkScope creates a new ID entry in the symbol table with the same lexeme as the old entry but with its scope field set to CurrentScope. ChkScope then returns a pointer to the new entry.

The parameter_list nonterminal uses declaration to declare the formal parameters of a subroutine and to generate the Cartesian product of all the formal parameters. One declaration may declare multiple formal parameters so a fourth synthesized attribute, prod, is added - declaration.prod is the Cartesian product of all parameter types declared by the declaration. Cartesian is a function that returns the Cartesian product of two type expressions - if type expressions are character strings then the Cartesian product is simply the concatenation of the two character strings. The translation scheme for declarations is:


declaration --> { idptr := ChkScope() ; } ID declaration_rest
{ declaration.prod := declaration_rest.prod ; InsertType(idptr, declaration_rest.typ ; declaration_rest.low ; declaration_rest.high) ; }
declaration_rest --> COMMA { idptr := ChkScope() ; } ID declaration_rest1
{ declaration_rest.typ := declaration_rest1.typ ;
declaration_rest.low := declaration_rest1.low ;
declaration_rest.high := declaration_rest1.high ;
declaration_rest.prod :=
Cartesian( declaration_rest1.prod, declaration_rest.typ ) ; InsertType( idptr, declaration_rest.typ, declaration_rest.low, declaration_rest.high) ; }
declaration_rest --> COLON type { declaration_rest.typ := type.typ ;
declaration_rest.low := type.low ;
declaration_rest.high := type.high ;
declaration_rest.prod := type.typ ; }

Declarations of Procedures and Functions

The type expression of a function or a procedure specifies the number and types of its formal parameters (arguments) with a Cartesian product. The project grammar defines the syntax of the formal parameter list with:

parameter_list --> declaration
| parameter_list SEMICOL declaration

When left recursion is eliminated we obtain:

parameter_list --> declaration plistrest
plistrest --> SEMICOL declaration plistrest |

The parameter_list node should return the Cartesian product of the arguments with a synthesized attribute, prod. The following translation scheme can be used:


parameter_list --> declaration plistrest { parameter_list.prod :=
Cartesian(declaration.prod, plistrest.prod ) ; }
plistrest --> SEMICOL declaration plistrest1 { plistrest.prod :=
Cartesian(declaration.prod, plistrest1.prod ) ; }
plistrest --> { plistrest.prod := void
/* the empty string if type expressions are character strings */ ; }

Arguments: The arguments nonterminal has one synthesized attribute, arguments.typ, which is the type expression for the formal parameters followed by the ">" string. The translation scheme for this nonterminal is:
arguments --> LPAR parameter_list RPAR
{ arguments.typ := Cartesian( parameter_list.prod, ">" ) ; }
arguments --> { arguments.typ := ">" ; }

Procedures: The declaration of a procedure uses the following production of the project grammar:
sub_head --> PROC ID arguments SEMICOL

The ID token in this production is the name of the procedure being defined: it must be a global symbol so other program units can call it. Any arguments following the name are local variables so a semantic action is needed to increment CurrentScope between the ID token and the arguments. The type expression for the name of the procedure is arguments.typ so the translation scheme for this production is:


sub_head --> PROC { idptr := attributes ; } ID { CurrentScope++ ; } arguments
{ InsertType( idptr, arguments.typ, NULL, NULL ) ; } SEMICOL

Functions: The declaration of a function uses the following production of the project grammar:
sub_head --> FUNC ID arguments COLON
standard_type SEMICOL

Pascal has no return statement to indicate what value a defined function should return to the caller. Instead the compiler declares a local variable with the same name as the function: the body of the defined function sets that local variable to the proper value before returning. For example, the following Pascal function computes the factorial function of any positive integer:

    function factorial( n : integer  ) : integer ;
    begin
        if n = 1 then factorial := 1 
        else 
            factorial := n * factorial(n-1)
    end

Note that in the else-clause of this function, factorial on the left side of the assignment operator refers to the local integer but factorial on the right-side refers to the global function. While compiling the body of a defined function, the compiler must differentiate between calls to execute the function and assignments of values to the returned value of the function. One way to handle this problem is as follows:

  1. Add a second entry to the symbol table for the returned value.

  2. Declare two globals in the compiler: FCallPtr to point to entry of the function itself; and FRetValPtr to point to the entry of the returned value.

  3. Statements in the grammar will compare the pointer of every ID entry to these compiler globals to change the pointer when necessary.

  4. FCallPtr and FRetValPtr are given NULL values except when compiling the body of a function.
The translation scheme uses the INSERT function to add the second entry to the symbol table:
sub_head --> FUNC { FCallPtr := attributes ; } ID { CurrentScope++ ;
FRetValPtr := INSERT( FCallPtr.lexeme, ID ) ; }
arguments COLON standard_type
{ InsertType( FRetValPtr, standard_type.typ, NULL, NULL ) ;
InsertType( FCallPtr, Cartesian( arguments.typ, standard_type.typ ), NULL, NULL ) ; } SEMICOL

The End of a Subroutine: Nonterminal subroutine in the project grammar defines the syntax of subroutine:

subroutine --> sub_head declarations block

Local symbols are only valid until the end of a subroutine so a semantic action is needed at that point to negate all scope fields in the symbol table that equal CurrentScope (as a debugging aid for project 2 this semantic action could also list the lexemes and type expressions of all entries it invalidates.) After that semantic action CurrentScope should be decremented and compiler globals FCallPtr and FRetValPtr set to NULL values. The translation scheme looks like:


subroutine --> sub_head declarations block
{ negate all scope fields that equal CurrentScope ;
CurrentScope-- ; FCallPtr := NULL ; FRetValPtr := NULL ; }

Type Checking Statements

Left-factoring the productions for the statement nonterminal in the project grammar produces the following:

statement --> ID stmt_rest
statement --> BEGINTOK block_rest
statement --> IFTOK expr THENTOK statement ELSETOK statement
statement --> WHILETOK expr DOTOK statement
stmt_rest --> ASSIGNOP expr
stmt_rest --> LBRK expr RBRK ASSIGNOP expr
stmt_rest --> LPAR expr_list RPAR
stmt_rest -->

Other nonterminals on the right-sides of these productions are block_rest, expr and expr_list but block_rest needs no semantic actions so we ignore it.

expr: The parent of an expr node in the parse tree needs to know both the lexeme and the type of the expression so the expr nonterminal has a synthesized attribute, expr.ptr, that points to the symbol table entry of the expression. In project 2 the only productions for expr are:

expr --> NUM
expr --> BCONST

A translation scheme for expr in project 2 is simply:

expr --> {expr.ptr := attributes ; } NUM
expr --> {expr.ptr := attributes ; } BCONST

Note that we place the semantic actions before the tokens in these productions as a reminder that attributes should be read before the tokens are matched.

expr_list: The productions for expr_list are:

expr_list --> expr
expr_list --> expr_list COMMA expr

but these productions must be modified to eliminate left recursion:

expr_list --> expr elistrest
elistrest --> COMMA expr elistrest
elistrest -->

The expr_list nonterminal returns the Cartesian product of all expressions in the list as a synthesized attribute, expr_list.typexpr. We assume there is a GetType function that accepts a pointer to a symbol table entry and returns the type expression of that entry. A translation scheme for these productions is:

expr_list --> expr elistrest { expr_list.typexpr :=
Cartesian( GetType( expr.ptr), elistrest.typexpr ) ; }
elistrest --> COMMA expr elistrest1 { elistrest.typexpr :=
Cartesian( GetType( expr.ptr), elistrest1.typexpr ) ; }
elistrest --> { elistrest.typexpr := void
/* the empty string if type expressions are character strings */ ; }

stmt_rest: The stmt_rest nonterminal accepts a pointer to the symbol table entry of an ID token in an inherited attribute, stmt_rest.idptr. We assume the type system described here. Type checking in the four productions for stmt_rest is described in the following paragraphs (t1 and t2 are used as temporary placeholders of type expressions.)

The first production assigns the value of an expression to a scalar variable so there is a type-error if stmt_rest.idptr does not point to a scalar. Integer-to-real and real-to-integer type conversions are allowable so the only other type-errors that can occur are when a boolean is assigned to a non-boolean or a non-boolean is assigned to a boolean:

stmt_rest --> ASSIGNOP { t1 := GetType(stmt_rest.idptr) ;
if t1 != 'b' and t1 != 'i' and t1 != 'r' then
type-error ; } expr { t2 := GetType(expr.ptr) ;
if (t1 != 'b' and t2 == 'b') or
(t1 == 'b' and t2 != 'b') then type-error ; }

The second production assigns the value of an expression to an element of an array so there is a type-error if stmt_rest.idptr does not point to an array. Also there is a type-error if the expression for the index is not an integer. Integer-to-real and real-to-integer type conversions are allowable so the only other type-errors that can occur are when a boolean is assigned to a non-boolean or a non-boolean is assigned to a boolean:

stmt_rest --> LBRK { t1 := GetType(stmt_rest.idptr) ;
if t1 != 'B' and t1 != 'I' and t1 != 'R' then
type-error ; } expr1 { if GetType(expr1.ptr)
!= 'i' then type-error ; } RBRK ASSIGNOP
expr2 { t2 := GetType(expr2.ptr) ;
if (t1 != 'B' and t2 == 'b') or
(t1 == 'B' and t2 != 'b') then type-error ; }

The third production calls a procedure with one or more arguments. The type expression of stmt_rest.idptr should equal expr_list.typexpr with a '>' character appended to it:

stmt_rest --> LPAR { t1 := GetType(stmt_rest.idptr) ;
expr_list
{ t2 := Cartesian(expr_list.typexpr, ">") ;
if t1 != t2 then type-error ; } RPAR

The fourth production calls a procedure with no arguments. The type expression of stmt_rest.idptr should simply be the '>' character:

stmt_rest --> { if GetType(stmt_rest.idptr)
!= ">" then type-error ; }

Note that intermediate code generation adds other semantic actions to all four productions for stmt_rest.

statement: There are four productions for the statement nonterminal. The first production should copy attributes into inherited attribute, stmt_rest.idptr, before matching the ID token:

statement --> { stmt_rest.idptr := attributes ; } ID stmt_rest

The second production has no semantic actions:

statement --> BEGINTOK block_rest

The third and fourth productions should check that expr is a boolean. Note that intermediate code generation adds other semantic actions to these two productions:

statement --> IFTOK expr
{ if GetType(expr.ptr) != 'b' then type-error ; }
THENTOK statement ELSETOK statement
statement --> WHILETOK expr
{ if GetType(expr.ptr) != 'b' then type-error ; }
DOTOK statement

Kenneth E. Batcher - 1/12/2006