Topic 7 - Intermediate Code Generation

Section 1.5 mentions the advantages of dividing a compiler into a front-end to analyze the source program and a back-end to synthesize the target program. Between the front-end and the back-end the program is in an intermediate form.

[8.1] - Intermediate Languages

A syntax tree is one way of showing the intermediate form of a program and postfix notation is another. A third way, three-address code, is described here.

Three-address Code: Most every arithmetic/logical operation in a computer combines two source operands to produce one result operand so three-address code is a sequence of statements of the form:

x := y op z

where the three addresses: x, y, and z, are names of variables or values of constants and op is an operator like +, -, *, /, and, or, etc.

There are also some unary operations that use only one source operand: uminus to negate a numeric value; not to complement a boolean value; cvtr2i to convert a real to an integer; cvti2r to convert an integer to a real; etc. Three-address code also allows statements of the form:

x := op y

where op is a unary operator.

Sometimes it is necessary to copy the value of an operand into another location so three-address code also allows copy statements of the form:

x := y

Source statements like if-then-else and while-do cause jumps in the control flow through three-address code so any statement in three-address code can be given a label to make it the target of a jump. The statement:

goto L

causes an unconditional jump to the statement with label L. The statement:

if y false goto L

causes a jump if and only if boolean y is false.

A procedure call like p(x1, x2, ..., xn) may have too many addresses for one statement in three-address code so it is shown as a sequence of n +1 statements:

param x1
param x2
. . .
param xn
call p, n

Similarly, an assignment statement with a function call like y := f(x1, x2, ..., xn) is shown as a sequence of n +1 statements in three-address code:

param x1
param x2
. . .
param xn
y := call f, n

When the source program contains a defined procedure there must be some way of delimiting the three-address code for the body of the procedure so the first statement of the body is labeled with the name of the procedure and a return statement marks the end of the body.

Similarly, the three-address code for the body of a defined function is delimited with the function name as a label on the first statement of the body and a return y statement at the end to indicate the returned value.

Reading the value of an array element is shown in three-address code with the statement:

x := y[i]

and writing a value into an array element is shown in three-address code with the statement:

x[i] := y.

As mentioned in the text, three-address code can also be enriched with statements for address and pointer assignments.

Quadruples: A dialect of three-address code that is closer to machine assembly code re-formats each statement into a structure with four fields called a quadruple:

operator source1 source2 result

The fields of a quadruple contain the names of the operator, the first source operand, the second source operand, and the result, respectively. Floating-point arithmetic instructions are different than fixed-point instructions in machines so quadruples also use different operator names; e.g., ADDI indicates integer addition while ADDR indicates real addition.

The output file of coding projects 2 and 3 is a sequence of quadruples with this format.

[8.3] Assignment Statements

The text describes translation schemes for generating three-address code. Schemes for generating quadruples are described here assuming that the stmt nonterminal has been left-factored as shown in section 6.2 of these notes:


stmt --> ID stmtrest
stmt --> BEGINTOK block_rest
stmt --> IFTOK expr THENTOK stmt ELSETOK stmt
stmt --> WHILETOK expr DOTOK stmt
stmtrest --> ASSIGNOP expr
stmtrest --> LBRK expr RBRK ASSIGNOP expr
stmtrest --> LPAR expr_list RPAR
stmtrest -->

The simple type system described here is also assumed.

Section 5.2 describes a translation scheme that constructs a syntax tree: a syntax tree must be built bottom-up from the leaves to the root because every non-leaf node needs pointers to its children. Intermediate code must also be generated bottom-up because a computer can't use any variable until it's value has been computed. Therefore, given a translation scheme to construct a syntax tree one can simply change its semantic rules/actions to make it generate intermediate code instead.

Temporary Variables: A syntax tree uses pointers to connect each parent node to its children. In intermediate code the result of every operation in an expression is stored in a temporary variable so any parent node that wants to use the result can refer to it. A function with no arguments, newtemp, returns a new name every time it's called. The text uses t1, t2, t3, ..., for these names but the coding projects use $1000, $1001, $1002, ..., instead.

Each temporary variable should be put into the symbol table as though it were an ID token so it can be given the appropriate type expression. Note that temporary variables aren't needed for ID, NUM, and BCONST tokens since they are already in the symbol table.

Functions in the Compiler: We assume the compiler contains a GetLex function that returns the lexeme of a symbol table entry and a GetType function that returns the type expression of a symbol table entry.

The InsertID function accepts a lexeme and a type expression as arguments and inserts a new ID-entry into the symbol table.

The GenQuad function outputs a quadruple to the output file whenever it is called with four string arguments containing the name of the operator, the lexeme of the first source operand, the lexeme of the second source operand, and the lexeme of the result, respectively. Missing arguments generate blank fields in the quadruple.

Scalar Variable Assignments: Assignment of an expression to a scalar variable is handled by the

stmtrest --> ASSIGNOP expr

production and generates either a COPYB, COPYI, COPYR, COPYI2R, or COPYR2I quadruple depending on the type expressions of stmtrest.idptr (received as an inherited attribute from the stmt nonterminal) and expr.ptr (returned as a synthesized attribute from the expr nonterminal.) The following scheme uses tid and tex as place holders for type expressions and op as a place holder for the quadruple operator:

stmtrest --> ASSIGNOP expr { tid := GetType( stmtrest.idptr ) ;
tex := GetType( expr.ptr ) ;
if ( tex = 'b' and tid = 'b' ) then op := "COPYB"
else if ( tex = 'i' and tid = 'i' ) then op := "COPYI"
else if ( tex = 'r' and tid = 'r' ) then op := "COPYR"
else if ( tex = 'i' and tid = 'r' ) then op := "COPYI2R"
else if ( tex = 'r' and tid = 'i' ) then op := "COPYR2I"
else type-error ; GenQuad( op,
GetLex( expr.ptr), "" , GetLex( stmtrest.idptr) ) ; }

Array Element Assignments: Assignment of an expression to an element of an array is handled by the

stmtrest --> LBRK expr1 RBRK ASSIGNOP expr2

production. Arrays in Pascal are declared with their lowest and highest indices defined so the intermediate code generated by this production must adjust the value of the index expression (expr1) appropriately - a SUBI quadruple must be generated to subtract the value of the lowest index of the array from the value of expr1. A type-conversion quadruple is also generated if the value of expr2 must be converted from real to integer or from integer to real.

The following scheme uses tid and tex2 as place holders for type expressions, lex1 and lex2 as place holders for lexemes, and op as a place holder for a quadruple operator:

stmtrest --> { tid := GetType( stmtrest.idptr ) ; } LBRK expr1
{ if GetType(expr1.ptr ) != 'i' then type-error
else { lex1 := newtemp ; InsertID( lex1, "i" ) ;
GenQuad( "SUBI", GetLex(expr1.ptr ),
GetLex( stmtrest.idptr->low), lex1 ) ; }
RBRK ASSIGNOP expr2 {
tex2 := GetType(expr2.ptr ) ; lex2 := GetLex(expr2.ptr ) ;
if ( tex2 = 'r' and tid = 'I' )
then { lex2 := newtemp ; tex2 := 'i' ;
InsertID( lex2, tex2 ) ; GenQuad( "COPYR2I",
GetLex(expr2.ptr ), "" , lex2 ) ; }
if ( tex2 = 'i' and tid = 'R' )
then { lex2 := newtemp ; tex2 := 'r' ;
InsertID( lex2, tex2 ) ; GenQuad( "COPYI2R",
GetLex(expr2.ptr ), "" , lex2 ) ; }
if( tex2 = 'b' and tid = 'B' then op := "STB"
else if( tex2 = 'i' and tid = 'I' then op := "STI"
else if( tex2 = 'r' and tid = 'R' then op := "STR"
else type-error ;
GenQuad( op, lex1, lex2, GetLex( stmtrest.idptr ) ) ; }

[8.4] Boolean Expressions

Here we show translation schemes for the if-then-else and while-do statements.

if-then-else: The if-then-else statement is handled by the:

stmt --> IFTOK expr THENTOK stmt1 ELSETOK stmt2

production and produces four quadruples intermixed with code to evaluate the expression, code to execute stmt1, and code to execute stmt2 as follows:

Code to evaluate expr
. . .
BFALSE GetLex( expr.ptr) ElseLabel
 
Code to execute stmt1
. . .
JUMP EndLabel
 
 
LABEL ElseLabel
 
 
Code to execute stmt2
. . .
LABEL EndLabel
 
 

The translation scheme calls newtemp to create ElseLabel and EndLabel : these labels are not inserted into the symbol table. Code to evaluate the expression, code to execute the then-statement, and code to execute the else-statement is generated when the L-attributed definition traverses these three children, respectively. The scheme simply generates the four quadruples at the appropriate times:

stmt --> IFTOK expr {
if( GetType( expr.ptr) != 'b' ) then type-error ;
ElseLabel := newtemp ; EndLabel := newtemp ;
GenQuad( "BFALSE", GetLex( expr.ptr ), ElseLabel, "" ) }
THENTOK stmt { GenQuad( "JUMP", EndLabel, "", "" ) ;
GenQuad( "LABEL", ElseLabel, "" , "" ) ; ELSETOK
stmt { GenQuad( "LABEL", EndLabel, "" , "" ) ; }

while-do: The while-do statement is handled by the:

stmt --> WHILETOK expr DOTOK stmt1

production and produces four quadruples intermixed with code to evaluate expr and code to execute stmt1 as follows:

LABEL BeginLabel
 
 
Code to evaluate expr
. . .
BFALSE GetLex( expr.ptr) EndLabel
 
Code to execute stmt1
. . .
JUMP BeginLabel
 
 
LABEL EndLabel
 
 

The translation scheme simply generates the four quadruples at the appropriate times:

stmt --> WHILETOK {BeginLabel := newtemp ;
GenQuad( "LABEL", BeginLabel, "" , "" ) ; } expr {
if( GetType( expr.ptr) != 'b' ) then type-error ;
EndLabel := newtemp ;
GenQuad( "BFALSE", GetLex( expr.ptr ), EndLabel, "" ) }
DOTOK stmt { GenQuad( "JUMP", BeginLabel, "" , "" ) ;
GenQuad( "LABEL", EndLabel, "" , "" ) ; }

PROCEDURE CALLS

No Arguments: A procedure call with no arguments is handled by the:

stmtrest -->

production. The translation scheme need only check the type expression of the procedure and generate a CALL quadruple:

stmtrest --> { if(GetType( stmtrest.idptr ) != ">" then type-error ;
GenQuad( "CALL", "0", GetLex( stmtrest.idptr ), "" ) ; }

One or More Arguments: A procedure call with one or more arguments is handled by the:

stmtrest --> LPAR expr_list RPAR

production. We assume that left recursion has been eliminated from the expr_list nonterminal as shown in section 6.2 of these notes:

expr_list --> expr elistrest
elistrest --> COMMA expr elistrest
elistrest -->

The type checking translation scheme for the expr_list and elistrest nonterminals shown in section 6.2 of these notes is modified so an appropriate PARAM quadruple is generated for each expression in the list:

expr_list --> expr { tex := GetType( expr.ptr) ;
if( tex = 'b' ) then op := "PARAMB"
else if( tex = 'i' ) then op := "PARAMI"
else if( tex = 'r' ) then op := "PARAMR"
else type-error ;
GenQuad( op, GetLex( expr.ptr ), "" , "" ) ; }
elistrest { expr_list.typexpr :=
Cartesian( tex, elistrest.typexpr ) ; }
elistrest --> COMMA expr { tex := GetType( expr.ptr) ;
if( tex = 'b' ) then op := "PARAMB"
else if( tex = 'i' ) then op := "PARAMI"
else if( tex = 'r' ) then op := "PARAMR"
else type-error ;
GenQuad( op, GetLex( expr.ptr ), "" , "" ) ; }
elistrest1 { elistrest.typexpr :=
Cartesian( tex, elistrest1.typexpr ) ; }
elistrest --> { elistrest.typexpr := "" ; }

When the translation scheme for the:

stmtrest --> LPAR expr_list RPAR

production receives expr_list.typexpr it should make sure it agrees with the domain of GetType( stmtrest.idptr ) and call:

GenQuad("CALL", n, GetLex( stmtrest.idptr ), "" )

where n is the character string for the number of arguments.


Kenneth E. Batcher - 8/8/2002