CS 4/53111 Course Notes

Topic 9 - Run-Time Environments

This topic relates the static source text of a program to the dynamic actions it performs at run-time. During execution the same name in the source text can denote different objects in the target machine. Each execution of a procedure is referred to as an activation of the procedure. A recursive procedure may call itself multiple times so several activations of a procedure may be alive simultaneously.

[7.1] - Source Language Issues

In this section we consider a program with a number of recursive procedures and/or functions as in Pascal.

Procedures

Basically, a procedure definition associates an identifier (the procedure name ) with a statement (the procedure body.) For example, figure 7.1 shows a Pascal program with a procedure named readarray defined on lines 3-7: the body of the procedure is shown on lines 5-7.

A procedure is called when its name appears in an executable statement: a procedure call executes the body of the procedure. The main program (lines 21-25 of figure 7.1) calls readarray on line 23 and then calls the quicksort procedure on line 24. Procedure calls may also occur within expressions as in line 16.

Some of the identifiers appearing within a procedure definition are formal parameters (or formal arguments, dummy arguments, or formals ) of the procedure. For example, identifiers m and n on line 12 are the formal parameters of quicksort. Arguments, actual parameters, are substituted for the formal parameters when the procedure is called. For example, the call to quicksort in line 24 of the main program substitutes the actual parameters 1 and 9 for the formal parameters m and n, respectively.

Activation Trees

Each execution of a procedure body is an activation of the procedure. The lifetime of an activation of a procedure p is the sequence of steps between the first and last steps in the execution of the procedure body including time spent executing procedures called by p and procedures called by those procedures, etc.

In most languages, each time procedure q is called from procedure p control eventually returns to procedure p (unless there is a fatal error.) To be more specific, each time control flows from an activation of procedure p to an activation of procedure q it eventually returns to the same activation of procedure p. Thus if a and b are procedure activations then their lifetimes are either non-overlapping or are nested; i.e., if b is entered before a is left then control must leave b before it can leave a.

One can depict the way control enters and leaves activations with an activation tree . For example, an execution of the program in figure 7.1 may have an activation tree as shown in figure 7.3 where s denotes an activation of the main sort program, r denotes an activation of the readarray procedure, q(m, n ) denotes an activation of the quicksort procedure with actual parameters m and n, and p(y , z ) denotes an activation of the partition procedure with actual parameters y and z.

Note that when control resides in some activation on the tree then all ancestors of that activation are still alive. For example, if control currently resides in activation p(2,3) of figure 7.3 then activations q(2,3), q(1,3), q(1,9), and s are still alive.

Control Stacks

The flow of control in a program corresponds to a depth-first traversal of an activation tree. The traversal starts at the root, visits each node before its children, recursively visits all children of a node in left-to-right order (returning to the parent between visits to the children), and finally returns to the root.

One can use a stack (called the control stack ) to keep track of live procedure activations. When an activation is started an item is pushed on the stack and the item remains on the stack until that activation ends. For example, if control currently resides in activation p(2,3) of figure 7.3 then the control stack has five items:

q(2,3)
p(2,3)	<-- top of stack
q(1,3)
q(1,9)
s	<-- bottom of stack

The Scope of a Declaration

A declaration is a language construct that associates information with a name. For example, var i : integer; in a Pascal program declares that i is the name of an integer.

There may be multiple independent declarations of the same name in different parts of a program. For example, i is declared three times in the program of figure 7.1: on lines 4, 9, and 13. The scope rules of the source language determine which declaration of a name applies to each usage of the name. In figure 7.1: the declaration of i on line 4 applies to the usages of i in line 6; the declaration of i on line 9 applies to any usages of i within the body of partition (lines 10-11); and the declaration of i on line 13 applies to the usages of i in lines 16-18.

The scope of a declaration is that portion of a program where that declaration applies. The usage of a name in a procedure is local if it is within the scope of a declaration within that procedure; otherwise, the usage is nonlocal. At compile time, the symbol table can be used to find the declaration that applies to each usage of a name.

Bindings of Names

Even if a name is only declared once in a program it may denote different objects at run-time. We use the term environment to describe the mapping of a name to a storage location and the term state to describe the mapping of a storage location to the value held within it.

For example, the environment might assign the name pi to storage location 100 which might initially hold a value of 0: the assignment statement, pi := 3.14, changes the state of storage location 100 but doesn't change the environment.

There is a distinction between the meaning of identifiers on the left and right sides of an assignment statement. For example, in the statement, i := i + 1; the i-identifier on the left side refers to the location of i in storage while the i-identifier on the right side refers to the current value of i. We use the terms l-value and r-value to distinguish between these meanings. The l-value (or left-value ) of an identifier refers to the location of a variable that we use on the left side of an assignment while the r-value (or right-value ) of an identifier refers to the current value of a variable that we use on the right side of an assignment.

An environment binds a name to a particular l-value. Note that with recursive procedures there may be multiple bindings of the same name. For example, when the control stack for the program of figure 7.1 looks like:

q(2,3)
p(2,3)	<-- top of stack
q(1,3)
q(1,9)
s	<-- bottom of stack

then the name, i, has four bindings: a binding in each of the p(2,3), q(2,3), q(1,3), and q(1,9) activations.

[7.2] - Storage Organization

This section describes an organization of run-time storage suitable for languages like Fortran, Pascal, and C.

Subdivision of Run-Time Memory

We assume that the compiler obtains a block of storage for the compiled program to run in. This block of storage must hold: (1) - the generated target code, (2) - all data objects, and (3) - the control stack to keep track of procedure activations.

Once all target code has been generated its size is fixed at compile time so the compiler can place it in some statically determined area, such as the low end of storage. The size of static data (global data variables) is also fixed at compile time so these data objects can be placed next to the target code.

Fortran doesn't allow recursive procedures so each local variable needs only one storage location. The size of all local data is also fixed at compile time and these data objects can be placed next to global data.

Pascal and C allow recursive procedures so each local variable needs a storage location for each activation that's alive. Storage locations for these variables are placed inside the activation records on the control stack. The number of activation records on the control stack changes as procedure calls/returns occur: the bottom of the control stack can be in a fixed location but the top of the control stack must be allowed to move.

Pascal and C allow the user to allocate/deallocate storage for dynamic data objects. This storage comes from a separate area of run-time memory called the heap.

Run-time storage can be subdivided as shown below:

Target Code
Static Data
Control Stack	<-- bottom
Control Stack	< top
. . .
Heap

Activation Records

Each activation record or frame holds information for one activation of a procedure: the frame is pushed on the control stack when the procedure is called and popped when the procedure returns to its caller. An activation record may look like:

returned value
actual parameters
control link
access link
saved machine status
local data
temporaries

Some compilers keep some of the information in machine registers. The program counter return value is part of the saved machine status. The control link points to the activation record of the caller. The access link is used to access nonlocal data as described in section 7.4.

Compile-Time Layout of Local Data

Figure 7.9 shows the typical data layouts of C compilers: many target machines have alignment restrictions that must be honored.

[7.3] - Storage-Allocation Strategies

Here we describe static allocation, stack allocation, and heap allocation.

Static Allocation

Static allocation is used for data that has only one binding in a fixed storage location. Data items are referenced using absolute addresses. Global variables can be statically allocated. Fortran doesn't allow recursion so local variables and arguments in Fortran subroutines can also be statically allocated.

Stack Allocation

Stack allocation is used for data in recursive procedures that may have multiple bindings. Each procedure call creates a new activation record on the control stack in which the actual parameters, the local variables, the temporaries, and the returned value of that activation can be stored. The activation record remains on the stack until the procedure activation returns to its caller. Stack-allocated data items are referenced using displacements from a register, top, pointing to the top of the control stack.

Calling Sequences

A call sequence is a sequence of machine instructions that is executed every time a procedure is called: it allocates an activation record for the called procedure and enters some information into the record. A return sequence is a sequence of machine instructions that is executed every time a procedure returns control to its caller: it restores the machine status so the calling procedure can continue execution.

Activation records and calling sequences differ from machine to machine: the calling sequences are often divided up between the caller and the procedure being called (the callee ). A principle often followed in the design of an activation record is to put fixed-size fields in the middle between the fields communicating data to/from the caller and fields of concern only to the callee: the activation record shown in figure 7.8 follows this principle:

	returned value	Caller's Activation Record
	actual parameters
	control link
	access link
	saved machine status
top_sp --> (caller)	local data
	temporaries
	returned value	Callee's Activation Record
	actual parameters
	control link
	access link
	saved machine status
top_sp --> (callee)	local data
	temporaries

The following call sequence assumes there is a register (top_sp ) holding a pointer to the start of local data in the current activation record:

The caller evaluates the actual parameters and inserts them into the appropriate places of the callee's activation record.
The caller stores a return address and the value of its top_sp register into the saved machine status of the callee's record.
The top_sp register is set to point to the start of the callee's local data and control is sent to the callee's code.
The callee saves other register values and status information.
The callee initializes its local data and starts execution.

A possible return sequence is:

The callee places a returned value at the start of its activation record, next to the caller's record.
Using the saved machine status in its record, the callee restores top_sp and other registers and branches to the return address in the caller's code.
The caller resumes execution: the location of the returned value is a known displacement from top_sp.

Variable-Length Data

Figure 7.15 shows a common strategy for handling variable-length arrays: placing them after the activation record and referencing them through pointers in the record. Besides top_sp there needs to be another pointer, top, to keep track of the start of any new activation record.

Dangling References

Figure 7.16 shows an example of a dangling reference. The integer function, dangle, returns a pointer to a local integer, i, but i and everything else in its activation record is deallocated when dangle returns.

Heap Allocation

Heap allocation differs from stack allocation because there are no nesting rules: one can allocate space for item a, then allocate space for item b, then deallocate item a before deallocating item b. This tends to fragment heap storage into areas of allocated space and areas of free space. Fragmentation is reduced if all heap items have the same size or just a few different sizes.

[7.4] - Access To Nonlocal Names

References to nonlocal names are treated according to the scope rules of the source language. Pascal, C, and Ada use lexical-scope or static-scope : examine the program text and use the "most-closely nested" declaration of a name. Lisp, APL, and Snobol use dynamic scope : examine the current activations to find the appropriate declaration of a name.

Blocks

A block is a statement with its own local data declarations. In C the syntax of a block is:

{ declarations statements }

where the braces delimit the block. Blocks can be nested as shown in figure 7.18:

main()
{      /* start block 0  */
    int a = 0 ;
    int b = 0 ;
    {      /* start block 1  */
        int b = 1 ;
        {      /* start block 2  */
            int a = 2 ;
            printf("%d %d\n", a, b) ;
        }      /* end block 2  */
        {      /*  start block 3  */
            int b = 3 ;
            printf("%d %d\n", a, b) ;
        }      /* end block 3  */
        printf("%d %d\n", a, b) ;
    }      /* end block 1  */
    printf("%d %d\n", a, b) ;
}      /* end block 0  */

The scope of a declaration is given by the most closely nested rule:

The scope of a declaration in block B includes B.
If a name x is used in block B but is not declared in B then use a declaration of x in an enclosing block B ' where B ' is more closely nested around B than any other enclosing block with a declaration of x.

Using this rule the four print statements of the example C program output:

Nesting of block structures can be treated at run time with stack allocation: each time a new block is entered space on a stack is allocated to hold the variables declared within that block and each time a block is exited its stack space is popped (note that in the example C program, a in block 2 and b in block 3 share the same stack location.)

Lexical Scope Without Nested Procedures

In C the definition of a procedure or function cannot appear within the definition of any other procedure or function. Any nonlocal name used within a procedure or function must be nonlocal to all procedures and functions so it must be a global name that can be statically allocated.

A benefit of static allocations of nonlocals is that declared procedures can be passed as parameters and returned as results. For example, figure 7.21 shows a Pascal program with integer functions of integers, f and g, and a procedure, b, that accepts such a function as a parameter. Both f and g use a global variable, m. The main program initializes m to 0 and then calls b(f) and b(g).

Lexical Scope with Nested Procedures

Figure 7.22 shows a Pascal sort program where the definition of partition is within the definition of quicksort. The nonlocal names in partition are a, v, and exchange of which a and exchange are global and v is declared within quicksort.

Nesting Depth

The notion of nesting depth is used to implement lexical scope. In the example of figure 7.22, the main program, sort, is at nesting depth 1; procedures, readarray, exchange, and quicksort, are at nesting depth 2; and function partition is at nesting depth 3. With each occurrence of a name we associate the nesting depth of the procedure in which the name is declared. Thus, within the body of partition the uses of names a, v, exchange, i, and j, have nesting depths 1, 2, 1, 3, and 3, respectively.

Access Links

Lexical scope of nested procedures can be implemented with an access link in each activation record: if the definition of a procedure p is nested immediately inside the definition of a procedure q then the access link in an activation record for p points to the access link in the most recent activation record for q.

Procedure Parameters

If a nested procedure is passed as a parameter its access link must also be passed as shown in figures 7.24 and 7.25.

Displays

Rather than chase through a string of access links for each nonlocal access the compiler can maintain an array of pointers to activation records called a display . Figure 7.26 shows an example.

Dynamic Scope

With dynamic scope one should access a nonlocal name from the most recent activation record with space allocated for that name. Figure 7.27 shows an example where the output of a program depends on whether lexical scope or dynamic scope is used.

[7.5] - Parameter Passing

The following Pascal procedure (figure 7.28) exchanges two elements of an integer array, a:

procedure exchange(i, j: integer);
    var x : integer;
    begin
        x := a[i] ;
        a[i] := a[j] ;
        a[j] := x
    end

Communication between this procedure and its caller is through the nonlocal a and through the parameters i and j. Here we discuss several methods of associating actual and formal parameters: call-by-value, call-by-reference, copy-restore, and call-by-name. The left-value (l-value ), the right-value (r-value ), or the name of a variable may be passed to the called procedure.

Call-by-Value

Usually, C and Pascal use call-by-value: the r-values of the actual parameters are passed to the called procedure:

A formal parameter is assigned a storage location in the activation record of the called procedure (just like a local variable.)
The caller evaluates the actual parameters and places their r-values in the storage locations of the formal parameters.

If the called procedure changes a formal parameter the change only occurs in the activation record of the called procedure. The change is lost when the record is deallocated so the actual parameter in the caller's record is not changed.

Call-by-Reference

Call-by-reference (or call-by-address or call-by-location) passes l-values to the called procedure:

If an actual parameter is a name or an expression having an l-value, then that l-value is passed to the called procedure.
If the actual parameter is an expression like a+b or 2 which has no l-value, then the expression is evaluated in a new location, and the address of that location is passed to the called procedure.

The var keyword in the first line of the following Pascal procedure (from figure 7.29) causes call-by-reference instead of call-by-value:

procedure swap(var x, y: integer);
    var temp: integer;
    begin
       temp := x;
       x    := y;
       y    := temp;
    end;

In the swap procedure, all reads and writes of the formal parameters, x and y, will read and write the associated actual parameters in the caller's activation record. Thus, swap will actually swap the actuals.

Example 7.7: If the swap procedure above were called with swap(i, a[i]) then the following steps would occur:

The l-values of i and a[i] would be copied into the activation record of swap.
temp would be set to the r-value of i, say I₀.
i would be set to the r-value of a[I₀].
a[I₀] would be set to the r-value of temp or I₀.

Using call-by-reference, the r-values of the arguments are swapped correctly.

Copy-Restore

Copy-restore is a hybrid of call-by-value and call-by-reference:

The actual parameters are evaluated before the call and their r-values are passed to the called procedure as in call-by-value. But the caller also remembers the l-values of those actuals that have l-values.
When control returns to the caller it copies back the r-values of the formal parameters into the l-values of the actuals.

Usually, copy-restore has the same effect as call-by-reference. It could have a different effect when the called procedure refers to one or more of the actual parameters as a nonlocal. The operation of the Pascal program below (figure 7.31) depends on whether call-by-reference or copy-restore is used:

(1) program copyout(input,output);
(2)    var a: integer;
(3)    procedure unsafe(var x: integer);
(4)       begin x := 2; a := 0 end;
(5)    begin
(6)       a := 1; unsafe(a); writeln(a)
(7)    end.

Call-by-Name

Call-by-Name is used by Algol:

The procedure is treated as if it were a macro; i.e., its body is substituted for the call in the caller, with the actual parameters literally substituted for the formals.
Local names in the procedure are kept distinct from names in the caller.
Actual parameters are surrounded by parentheses if necessary to preserve their integrity.

Example 7.8: Suppose example 7.7 is repeated with call-by-name instead of call-by reference:

The swap(i, a[i]) call in the caller is replaced with:
```
temp := i ;
i    := a[i] ;
a[i] := temp
```
The first line copies the r-value of i (I₀) into temp.
The second line copies the r-value of a[I₀] into i.
The third line copies the r-value of temp (I₀) into a[a[I₀]].

Using call-by-name, the r-values of the arguments are not swapped correctly.

Kenneth E. Batcher - 8/6/2001