A source program should follow both the syntactic and semantic rules of the source language. Some rules can be checked statically during compile time and other rules can only be checked dynamically during run time. Static checking includes the syntax checks performed by the parser and semantic checks such as type checks, flow-of-control checks, uniqueness checks, and name-related checks. Here we focus on type checking.
[6.1] - Type Systems
Basic types are atomic types that have no internal structure as far as the programmer is concerned. They include types like integer, real, boolean, and character. Subrange types like 1..10 in Pascal and enumerated types like (violet, indigo, blue, green, yellow, orange, red) are also basic types.
Constructed types include arrays, records, sets, and structures constructed from the basic types and/or other constructed types. Pointers and functions are also constructed types.
Type Expressions: A type expression denotes the type of a language construct, it is either a basic type or formed from other type expressions by applying an operator called a type constructor. Here we use type expressions formed from the following rules:
where if the name of field i is namei and the type expression of field i is Ti then Fi is:
For example, the type expression of the mod operator in Pascal is: integer x integer --> integer because it divides an integer by an integer and returns the integer remainder.
The type expression for the domain of a function with no arguments is void and the type expression for the range of a function with no returned value is void: e.g., void --> void is the type expression for a procedure with no arguments and no returned value.
[6.2] - A Simple Type Checker
Most all programming languages insist that the type of an ID token be declared before it can be used. A type checker has two kinds of actions: (1) when processing declarations it stores the appropriate type expressions in the symbol table entries of ID tokens; and (2) when processing statements it checks that all ID tokens, constants, etc., are of the proper types. Here we describe a translation scheme for treating declarations in the project grammar.
The type expression for an array has three attributes:
type | --> | standard_type { type.typ := standard_type.typ ; type.low := NULL ; type.high := NULL ; } |
type | --> | ARRAYTOK LBRK { ChkInt() ; type.low := attributes ; } NUM DOTDOT { ChkInt() ; type.high := attributes ; } NUM RBRK OFTOK standard_type { type.typ := ChangeToArray(standard_type.typ) ; } |
standard_type | --> | INTTOK { standard_type.typ := integer ; } |
standard_type | --> | REALTOK { standard_type.typ := real ; } |
standard_type | --> | BOOLTOK { standard_type.typ := boolean ; } |
A declaration of scalars or arrays uses the following productions:
declaration | --> | ID declaration_rest |
declaration_rest | --> | COMMA ID declaration_rest | COLON type |
For example, the parse tree for ID1, ID2 : real is:
_____________ | | | declaration | |_____________| | \ | \__________________ ID1 | | | declaration_rest | |__________________| / | \ / | \__________________ COMMA ID2 | | | declaration_rest | |__________________| / \______ / | | COLON | type | |______| \_______________ | | | standard_type | |_______________| real ____/The type node is at the bottom of a chain of declaration_rest nodes so it's a simple matter to move the synthesized attributes, typ, low, and high, up the chain and insert them into the symbol table entries of the ID tokens. InsertType is a function that inserts typ, low, and high, into the appropriate fields of a symbol table entry.
A subroutine in the source program may declare a local variable with the same name as a global variable so a new symbol table entry must be created for the local variable. The translation scheme calls a function, ChkScope, to create such a new entry whenever it is needed. ChkScope checks the scope field of the ID-entry that attributes points to:
The parameter_list nonterminal uses declaration to declare the formal parameters of a subroutine and to generate the Cartesian product of all the formal parameters. One declaration may declare multiple formal parameters so a fourth synthesized attribute, prod, is added - declaration.prod is the Cartesian product of all parameter types declared by the declaration. Cartesian is a function that returns the Cartesian product of two type expressions - if type expressions are character strings then the Cartesian product is simply the concatenation of the two character strings. The translation scheme for declarations is:
declaration | --> | { idptr := ChkScope() ; } ID declaration_rest { declaration.prod := declaration_rest.prod ; InsertType(idptr, declaration_rest.typ ; declaration_rest.low ; declaration_rest.high) ; } |
declaration_rest | --> | COMMA { idptr := ChkScope() ; } ID
declaration_rest1 { declaration_rest.typ := declaration_rest1.typ ; declaration_rest.low := declaration_rest1.low ; declaration_rest.high := declaration_rest1.high ; declaration_rest.prod := Cartesian( declaration_rest1.prod, declaration_rest.typ ) ; InsertType( idptr, declaration_rest.typ, declaration_rest.low, declaration_rest.high) ; } |
declaration_rest | --> | COLON type { declaration_rest.typ := type.typ ; declaration_rest.low := type.low ; declaration_rest.high := type.high ; declaration_rest.prod := type.typ ; } |
The type expression of a function or a procedure specifies the number and types of its formal parameters (arguments) with a Cartesian product. The project grammar defines the syntax of the formal parameter list with:
parameter_list | --> | declaration | parameter_list SEMICOL declaration |
When left recursion is eliminated we obtain:
parameter_list | --> | declaration plistrest |
plistrest | --> | SEMICOL declaration plistrest | |
The parameter_list node should return the Cartesian product of the arguments with a synthesized attribute, prod. The following translation scheme can be used:
parameter_list | --> | declaration plistrest { parameter_list.prod := Cartesian(declaration.prod, plistrest.prod ) ; } |
plistrest | --> | SEMICOL declaration plistrest1
{ plistrest.prod := Cartesian(declaration.prod, plistrest1.prod ) ; } |
plistrest | --> |
{ plistrest.prod := void /* the empty string if type expressions are character strings */ ; } |
arguments | --> | LPAR parameter_list RPAR { arguments.typ := Cartesian( parameter_list.prod, ">" ) ; } |
arguments | --> | { arguments.typ := ">" ; } |
sub_head | --> | PROC ID arguments SEMICOL |
The ID token in this production is the name of the procedure being defined: it must be a global symbol so other program units can call it. Any arguments following the name are local variables so a semantic action is needed to increment CurrentScope between the ID token and the arguments. The type expression for the name of the procedure is arguments.typ so the translation scheme for this production is:
sub_head | --> | PROC { idptr := attributes ; } ID { CurrentScope++ ; } arguments { InsertType( idptr, arguments.typ, NULL, NULL ) ; } SEMICOL |
sub_head | --> | FUNC ID arguments COLON standard_type SEMICOL |
Pascal has no return statement to indicate what value a defined function should return to the caller. Instead the compiler declares a local variable with the same name as the function: the body of the defined function sets that local variable to the proper value before returning. For example, the following Pascal function computes the factorial function of any positive integer:
function factorial( n : integer ) : integer ; begin if n = 1 then factorial := 1 else factorial := n * factorial(n-1) end
Note that in the else-clause of this function, factorial on the left side of the assignment operator refers to the local integer but factorial on the right-side refers to the global function. While compiling the body of a defined function, the compiler must differentiate between calls to execute the function and assignments of values to the returned value of the function. One way to handle this problem is as follows:
sub_head | --> | FUNC { FCallPtr := attributes ; } ID { CurrentScope++ ; FRetValPtr := INSERT( FCallPtr.lexeme, ID ) ; } arguments COLON standard_type { InsertType( FRetValPtr, standard_type.typ, NULL, NULL ) ; InsertType( FCallPtr, Cartesian( arguments.typ, standard_type.typ ), NULL, NULL ) ; } SEMICOL |
subroutine | --> | sub_head declarations block |
Local symbols are only valid until the end of a subroutine so a semantic action is needed at that point to negate all scope fields in the symbol table that equal CurrentScope (as a debugging aid for project 2 this semantic action could also list the lexemes and type expressions of all entries it invalidates.) After that semantic action CurrentScope should be decremented and compiler globals FCallPtr and FRetValPtr set to NULL values. The translation scheme looks like:
subroutine | --> | sub_head declarations block { negate all scope fields that equal CurrentScope ; CurrentScope-- ; FCallPtr := NULL ; FRetValPtr := NULL ; } |
Type Checking Statements
Left-factoring the productions for the statement nonterminal in the project grammar produces the following:
statement | --> | ID stmt_rest |
statement | --> | BEGINTOK block_rest |
statement | --> | IFTOK expr THENTOK statement ELSETOK statement |
statement | --> | WHILETOK expr DOTOK statement |
stmt_rest | --> | ASSIGNOP expr |
stmt_rest | --> | LBRK expr RBRK ASSIGNOP expr |
stmt_rest | --> | LPAR expr_list RPAR |
stmt_rest | --> |
Other nonterminals on the right-sides of these productions are block_rest, expr and expr_list but block_rest needs no semantic actions so we ignore it.
expr: The parent of an expr node in the parse tree needs to know both the lexeme and the type of the expression so the expr nonterminal has a synthesized attribute, expr.ptr, that points to the symbol table entry of the expression. In project 2 the only productions for expr are:
expr | --> | NUM |
expr | --> | BCONST |
A translation scheme for expr in project 2 is simply:
expr | --> | {expr.ptr := attributes ; } NUM |
expr | --> | {expr.ptr := attributes ; } BCONST |
Note that we place the semantic actions before the tokens in these productions as a reminder that attributes should be read before the tokens are matched.
expr_list: The productions for expr_list are:
expr_list | --> | expr |
expr_list | --> | expr_list COMMA expr |
but these productions must be modified to eliminate left recursion:
expr_list | --> | expr elistrest |
elistrest | --> | COMMA expr elistrest |
elistrest | --> |
The expr_list nonterminal returns the Cartesian product of all expressions in the list as a synthesized attribute, expr_list.typexpr. We assume there is a GetType function that accepts a pointer to a symbol table entry and returns the type expression of that entry. A translation scheme for these productions is:
expr_list | --> | expr elistrest
{ expr_list.typexpr := Cartesian( GetType( expr.ptr), elistrest.typexpr ) ; } |
elistrest | --> | COMMA expr elistrest1
{ elistrest.typexpr := Cartesian( GetType( expr.ptr), elistrest1.typexpr ) ; } |
elistrest | --> |
{ elistrest.typexpr := void /* the empty string if type expressions are character strings */ ; } |
stmt_rest: The stmt_rest nonterminal accepts a pointer to the symbol table entry of an ID token in an inherited attribute, stmt_rest.idptr. We assume the type system described here. Type checking in the four productions for stmt_rest is described in the following paragraphs (t1 and t2 are used as temporary placeholders of type expressions.)
The first production assigns the value of an expression to a scalar variable so there is a type-error if stmt_rest.idptr does not point to a scalar. Integer-to-real and real-to-integer type conversions are allowable so the only other type-errors that can occur are when a boolean is assigned to a non-boolean or a non-boolean is assigned to a boolean:
stmt_rest | --> | ASSIGNOP { t1 := GetType(stmt_rest.idptr) ; if t1 != 'b' and t1 != 'i' and t1 != 'r' then type-error ; } expr { t2 := GetType(expr.ptr) ; if (t1 != 'b' and t2 == 'b') or (t1 == 'b' and t2 != 'b') then type-error ; } |
The second production assigns the value of an expression to an element of an array so there is a type-error if stmt_rest.idptr does not point to an array. Also there is a type-error if the expression for the index is not an integer. Integer-to-real and real-to-integer type conversions are allowable so the only other type-errors that can occur are when a boolean is assigned to a non-boolean or a non-boolean is assigned to a boolean:
stmt_rest | --> | LBRK { t1 := GetType(stmt_rest.idptr) ; if t1 != 'B' and t1 != 'I' and t1 != 'R' then type-error ; } expr1 { if GetType(expr1.ptr) != 'i' then type-error ; } RBRK ASSIGNOP expr2 { t2 := GetType(expr2.ptr) ; if (t1 != 'B' and t2 == 'b') or (t1 == 'B' and t2 != 'b') then type-error ; } |
The third production calls a procedure with one or more arguments. The type expression of stmt_rest.idptr should equal expr_list.typexpr with a '>' character appended to it:
stmt_rest | --> | LPAR { t1 := GetType(stmt_rest.idptr) ; expr_list { t2 := Cartesian(expr_list.typexpr, ">") ; if t1 != t2 then type-error ; } RPAR |
The fourth production calls a procedure with no arguments. The type expression of stmt_rest.idptr should simply be the '>' character:
stmt_rest | --> |
{ if GetType(stmt_rest.idptr) != ">" then type-error ; } |
Note that intermediate code generation adds other semantic actions to all four productions for stmt_rest.
statement: There are four productions for the statement nonterminal. The first production should copy attributes into inherited attribute, stmt_rest.idptr, before matching the ID token:
statement | --> | { stmt_rest.idptr := attributes ; } ID stmt_rest |
The second production has no semantic actions:
statement | --> | BEGINTOK block_rest |
The third and fourth productions should check that expr is a boolean. Note that intermediate code generation adds other semantic actions to these two productions:
statement | --> | IFTOK expr { if GetType(expr.ptr) != 'b' then type-error ; } THENTOK statement ELSETOK statement |
statement | --> | WHILETOK expr { if GetType(expr.ptr) != 'b' then type-error ; } DOTOK statement |