This topic relates the static source text of a program to the dynamic actions it performs at run-time. During execution the same name in the source text can denote different objects in the target machine. Each execution of a procedure is referred to as an activation of the procedure. A recursive procedure may call itself multiple times so several activations of a procedure may be alive simultaneously.
[7.1] - Source Language Issues
In this section we consider a program with a number of recursive procedures and/or functions as in Pascal.
Procedures
Basically, a procedure definition associates an identifier (the procedure name ) with a statement (the procedure body.) For example, figure 7.1 shows a Pascal program with a procedure named readarray defined on lines 3-7: the body of the procedure is shown on lines 5-7.
A procedure is called when its name appears in an executable statement: a procedure call executes the body of the procedure. The main program (lines 21-25 of figure 7.1) calls readarray on line 23 and then calls the quicksort procedure on line 24. Procedure calls may also occur within expressions as in line 16.
Some of the identifiers appearing within a procedure definition are formal parameters (or formal arguments, dummy arguments, or formals ) of the procedure. For example, identifiers m and n on line 12 are the formal parameters of quicksort. Arguments, actual parameters, are substituted for the formal parameters when the procedure is called. For example, the call to quicksort in line 24 of the main program substitutes the actual parameters 1 and 9 for the formal parameters m and n, respectively.
Activation Trees
Each execution of a procedure body is an activation of the procedure. The lifetime of an activation of a procedure p is the sequence of steps between the first and last steps in the execution of the procedure body including time spent executing procedures called by p and procedures called by those procedures, etc.
In most languages, each time procedure q is called from procedure p control eventually returns to procedure p (unless there is a fatal error.) To be more specific, each time control flows from an activation of procedure p to an activation of procedure q it eventually returns to the same activation of procedure p. Thus if a and b are procedure activations then their lifetimes are either non-overlapping or are nested; i.e., if b is entered before a is left then control must leave b before it can leave a.
One can depict the way control enters and leaves activations with an activation tree . For example, an execution of the program in figure 7.1 may have an activation tree as shown in figure 7.3 where s denotes an activation of the main sort program, r denotes an activation of the readarray procedure, q(m, n ) denotes an activation of the quicksort procedure with actual parameters m and n, and p(y , z ) denotes an activation of the partition procedure with actual parameters y and z.
Note that when control resides in some activation on the tree then all ancestors of that activation are still alive. For example, if control currently resides in activation p(2,3) of figure 7.3 then activations q(2,3), q(1,3), q(1,9), and s are still alive.
Control Stacks
The flow of control in a program corresponds to a depth-first traversal of an activation tree. The traversal starts at the root, visits each node before its children, recursively visits all children of a node in left-to-right order (returning to the parent between visits to the children), and finally returns to the root.
One can use a stack (called the control stack ) to keep track of live procedure activations. When an activation is started an item is pushed on the stack and the item remains on the stack until that activation ends. For example, if control currently resides in activation p(2,3) of figure 7.3 then the control stack has five items:
p(2,3) | <-- top of stack |
---|---|
q(2,3) | |
q(1,3) | |
q(1,9) | |
s | <-- bottom of stack |
The Scope of a Declaration
A declaration is a language construct that associates information with a name. For example, var i : integer; in a Pascal program declares that i is the name of an integer.
There may be multiple independent declarations of the same name in different parts of a program. For example, i is declared three times in the program of figure 7.1: on lines 4, 9, and 13. The scope rules of the source language determine which declaration of a name applies to each usage of the name. In figure 7.1: the declaration of i on line 4 applies to the usages of i in line 6; the declaration of i on line 9 applies to any usages of i within the body of partition (lines 10-11); and the declaration of i on line 13 applies to the usages of i in lines 16-18.
The scope of a declaration is that portion of a program where that declaration applies. The usage of a name in a procedure is local if it is within the scope of a declaration within that procedure; otherwise, the usage is nonlocal. At compile time, the symbol table can be used to find the declaration that applies to each usage of a name.
Bindings of Names
Even if a name is only declared once in a program it may denote different objects at run-time. We use the term environment to describe the mapping of a name to a storage location and the term state to describe the mapping of a storage location to the value held within it.
For example, the environment might assign the name pi to storage location 100 which might initially hold a value of 0: the assignment statement, pi := 3.14, changes the state of storage location 100 but doesn't change the environment.
There is a distinction between the meaning of identifiers on the left and right sides of an assignment statement. For example, in the statement, i := i + 1; the i-identifier on the left side refers to the location of i in storage while the i-identifier on the right side refers to the current value of i. We use the terms l-value and r-value to distinguish between these meanings. The l-value (or left-value ) of an identifier refers to the location of a variable that we use on the left side of an assignment while the r-value (or right-value ) of an identifier refers to the current value of a variable that we use on the right side of an assignment.
An environment binds a name to a particular l-value. Note that with recursive procedures there may be multiple bindings of the same name. For example, when the control stack for the program of figure 7.1 looks like:
p(2,3) | <-- top of stack |
---|---|
q(2,3) | |
q(1,3) | |
q(1,9) | |
s | <-- bottom of stack |
then the name, i, has four bindings: a binding in each of the p(2,3), q(2,3), q(1,3), and q(1,9) activations.
[7.2] - Storage Organization
This section describes an organization of run-time storage suitable for languages like Fortran, Pascal, and C.
Subdivision of Run-Time Memory
We assume that the compiler obtains a block of storage for the compiled program to run in. This block of storage must hold: (1) - the generated target code, (2) - all data objects, and (3) - the control stack to keep track of procedure activations.
Once all target code has been generated its size is fixed at compile time so the compiler can place it in some statically determined area, such as the low end of storage. The size of static data (global data variables) is also fixed at compile time so these data objects can be placed next to the target code.
Fortran doesn't allow recursive procedures so each local variable needs only one storage location. The size of all local data is also fixed at compile time and these data objects can be placed next to global data.
Pascal and C allow recursive procedures so each local variable needs a storage location for each activation that's alive. Storage locations for these variables are placed inside the activation records on the control stack. The number of activation records on the control stack changes as procedure calls/returns occur: the bottom of the control stack can be in a fixed location but the top of the control stack must be allowed to move.
Pascal and C allow the user to allocate/deallocate storage for dynamic data objects. This storage comes from a separate area of run-time memory called the heap.
Run-time storage can be subdivided as shown below:
Target Code | |
---|---|
Static Data | |
Control Stack | <-- bottom |
< top | |
. . . | |
Heap |
Activation Records
Each activation record or frame holds information for one activation of a procedure: the frame is pushed on the control stack when the procedure is called and popped when the procedure returns to its caller. An activation record may look like:
returned value |
---|
actual parameters |
control link |
access link |
saved machine status |
local data |
temporaries |
Some compilers keep some of the information in machine registers. The program counter return value is part of the saved machine status. The control link points to the activation record of the caller. The access link is used to access nonlocal data as described in section 7.4.
Compile-Time Layout of Local Data
Figure 7.9 shows the typical data layouts of C compilers: many target machines have alignment restrictions that must be honored.
[7.3] - Storage-Allocation Strategies
Here we describe static allocation, stack allocation, and heap allocation.
Static Allocation
Static allocation is used for data that has only one binding in a fixed storage location. Data items are referenced using absolute addresses. Global variables can be statically allocated. Fortran doesn't allow recursion so local variables and arguments in Fortran subroutines can also be statically allocated.
Stack Allocation
Stack allocation is used for data in recursive procedures that may have multiple bindings. Each procedure call creates a new activation record on the control stack in which the actual parameters, the local variables, the temporaries, and the returned value of that activation can be stored. The activation record remains on the stack until the procedure activation returns to its caller. Stack-allocated data items are referenced using displacements from a register, top, pointing to the top of the control stack.
Calling Sequences
A call sequence is a sequence of machine instructions that is executed every time a procedure is called: it allocates an activation record for the called procedure and enters some information into the record. A return sequence is a sequence of machine instructions that is executed every time a procedure returns control to its caller: it restores the machine status so the calling procedure can continue execution.
Activation records and calling sequences differ from machine to machine: the calling sequences are often divided up between the caller and the procedure being called (the callee ). A principle often followed in the design of an activation record is to put fixed-size fields in the middle between the fields communicating data to/from the caller and fields of concern only to the callee: the activation record shown in figure 7.8 follows this principle:
returned value | Caller's Activation Record | |
---|---|---|
actual parameters | ||
control link | ||
access link | ||
saved machine status | ||
top_sp --> (caller) | local data | |
temporaries | ||
returned value | Callee's Activation Record | |
actual parameters | ||
control link | ||
access link | ||
saved machine status | ||
top_sp --> (callee) | local data | |
temporaries |
The following call sequence assumes there is a register (top_sp ) holding a pointer to the start of local data in the current activation record:
Variable-Length Data
Figure 7.15 shows a common strategy for handling variable-length arrays: placing them after the activation record and referencing them through pointers in the record. Besides top_sp there needs to be another pointer, top, to keep track of the start of any new activation record.
Dangling References
Figure 7.16 shows an example of a dangling reference. The integer function, dangle, returns a pointer to a local integer, i, but i and everything else in its activation record is deallocated when dangle returns.
Heap Allocation
Heap allocation differs from stack allocation because there are no nesting rules: one can allocate space for item a, then allocate space for item b, then deallocate item a before deallocating item b. This tends to fragment heap storage into areas of allocated space and areas of free space. Fragmentation is reduced if all heap items have the same size or just a few different sizes.
[7.4] - Access To Nonlocal Names
References to nonlocal names are treated according to the scope rules of the source language. Pascal, C, and Ada use lexical-scope or static-scope : examine the program text and use the "most-closely nested" declaration of a name. Lisp, APL, and Snobol use dynamic scope : examine the current activations to find the appropriate declaration of a name.
Blocks
A block is a statement with its own local data declarations. In C the syntax of a block is:
where the braces delimit the block. Blocks can be nested as shown in figure 7.18:
main() { /* start block 0 */ int a = 0 ; int b = 0 ; { /* start block 1 */ int b = 1 ; { /* start block 2 */ int a = 2 ; printf("%d %d\n", a, b) ; } /* end block 2 */ { /* start block 3 */ int b = 3 ; printf("%d %d\n", a, b) ; } /* end block 3 */ printf("%d %d\n", a, b) ; } /* end block 1 */ printf("%d %d\n", a, b) ; } /* end block 0 */The scope of a declaration is given by the most closely nested rule:
2 1 0 3 0 1 0 0Nesting of block structures can be treated at run time with stack allocation: each time a new block is entered space on a stack is allocated to hold the variables declared within that block and each time a block is exited its stack space is popped (note that in the example C program, a in block 2 and b in block 3 share the same stack location.)
Lexical Scope Without Nested Procedures
In C the definition of a procedure or function cannot appear within the definition of any other procedure or function. Any nonlocal name used within a procedure or function must be nonlocal to all procedures and functions so it must be a global name that can be statically allocated.
A benefit of static allocations of nonlocals is that declared procedures can be passed as parameters and returned as results. For example, figure 7.21 shows a Pascal program with integer functions of integers, f and g, and a procedure, b, that accepts such a function as a parameter. Both f and g use a global variable, m. The main program initializes m to 0 and then calls b(f) and b(g).
Lexical Scope with Nested Procedures
Figure 7.22 shows a Pascal sort program where the definition of partition is within the definition of quicksort. The nonlocal names in partition are a, v, and exchange of which a and exchange are global and v is declared within quicksort.
Nesting Depth
The notion of nesting depth is used to implement lexical scope. In the example of figure 7.22, the main program, sort, is at nesting depth 1; procedures, readarray, exchange, and quicksort, are at nesting depth 2; and function partition is at nesting depth 3. With each occurrence of a name we associate the nesting depth of the procedure in which the name is declared. Thus, within the body of partition the uses of names a, v, exchange, i, and j, have nesting depths 1, 2, 1, 3, and 3, respectively.
Access Links
Lexical scope of nested procedures can be implemented with an access link in each activation record: if the definition of a procedure p is nested immediately inside the definition of a procedure q then the access link in an activation record for p points to the access link in the most recent activation record for q.
Procedure Parameters
If a nested procedure is passed as a parameter its access link must also be passed as shown in figures 7.24 and 7.25.
Displays
Rather than chase through a string of access links for each nonlocal access the compiler can maintain an array of pointers to activation records called a display . Figure 7.26 shows an example.
Dynamic Scope
With dynamic scope one should access a nonlocal name from the most recent activation record with space allocated for that name. Figure 7.27 shows an example where the output of a program depends on whether lexical scope or dynamic scope is used.
[7.5] - Parameter Passing
The following Pascal procedure (figure 7.28) exchanges two elements of an integer array, a:
procedure exchange(i, j: integer); var x : integer; begin x := a[i] ; a[i] := a[j] ; a[j] := x endCommunication between this procedure and its caller is through the nonlocal a and through the parameters i and j. Here we discuss several methods of associating actual and formal parameters: call-by-value, call-by-reference, copy-restore, and call-by-name. The left-value (l-value ), the right-value (r-value ), or the name of a variable may be passed to the called procedure.
Call-by-Value
Usually, C and Pascal use call-by-value: the r-values of the actual parameters are passed to the called procedure:
Call-by-Reference
Call-by-reference (or call-by-address or call-by-location) passes l-values to the called procedure:
procedure swap(var x, y: integer); var temp: integer; begin temp := x; x := y; y := temp; end;In the swap procedure, all reads and writes of the formal parameters, x and y, will read and write the associated actual parameters in the caller's activation record. Thus, swap will actually swap the actuals.
Example 7.7: If the swap procedure above were called with swap(i, a[i]) then the following steps would occur:
Copy-Restore
Copy-restore is a hybrid of call-by-value and call-by-reference:
(1) program copyout(input,output); (2) var a: integer; (3) procedure unsafe(var x: integer); (4) begin x := 2; a := 0 end; (5) begin (6) a := 1; unsafe(a); writeln(a) (7) end.Call-by-Name
Call-by-Name is used by Algol:
temp := i ; i := a[i] ; a[i] := temp