[4.5] - Bottom-Up Parsing
Consider this grammar:
S | --> | a T U e |
T | --> | T b c | b |
U | --> | d |
and the rightmost derivation of the sentence: a b b c d e:
S | ==> | a T U e |
==> | a T d e | |
==> | a T b c d e | |
==> | a b b c d e |
As mentioned in section 4.2, a bottom-up parser is an LR parser so it reads the input from left-to-right and performs a rightmost derivation in reverse order. There are four steps in the rightmost derivation of a b b c d e so a bottom-up parser performs the steps in reverse order:
Handles: The substring of the sentential form that the parser chooses to reduce in each step of the parse is called the handle for that step. In the previous example the handles are:
Shift-Reduce Parsing: Most bottom-up parsers are implemented as shift-reduce parsers. Such a parser uses a stack to hold grammar symbols (it is convenient to think of a horizontal stack with its bottom on the left and its top on the right) and has four possible actions:
We use $ to mark the left-end (bottom) of the stack and also the end of the input string. Initially the stack is empty. Parsing ends successfully when the input is empty and the stack contains only the start symbol. As an example we use the following grammar:
E | --> | E + E |
E | --> | E *E |
E | --> | (E ) |
E | --> | id |
Figure 4.22 shows the actions of a shift-reduce parser to parse the input string id1 + id2 * id3 according to the grammar. Here we parse id1 * ( id2 + id3 ):
Stack | Input | Action |
---|---|---|
$ | id1 * ( id2 + id3 ) $ | shift |
$ id1 | * ( id2 + id3 ) $ | E --> id |
$ E | * ( id2 + id3 ) $ | shift |
$ E * | ( id2 + id3 ) $ | shift |
$ E * ( | id2 + id3 ) $ | shift |
$ E * ( id2 | + id3 ) $ | E --> id |
$ E * ( E | + id3 ) $ | shift |
$ E * ( E + | id3 ) $ | shift |
$ E * ( E + id3 | ) $ | E --> id |
$ E * ( E + E | ) $ | E --> E + E |
$ E * ( E | ) $ | shift |
$ E * ( E ) | $ | E --> ( E ) |
$ E * E | $ | E --> E * E |
$ E | $ | accept |
Shift-reduce parsers can be constructed for a large class of grammars - the LR grammars - but the construction is usually so complicated that they are only constructed by parser-construction programs (see section 4.7.) However, the next section will show that there is a small but important class of grammars where shift-reduce parsers can be easily constructed by hand.
[4.6] - Operator-Precedence Parsing
If no production of a grammar has two or more adjacent nonterminals on its right-side and no production is an -production then one can easily hand-construct a shift-reduce parser for the grammar. Such a parser is called an operator-precedence parser.
The syntax of arithmetic expressions can usually be described with such a grammar:
E | --> | E + E |
E | --> | E *E |
E | --> | (E ) |
E | --> | - E |
E | --> | id |
This grammar has 6 terminals:
+ | - | * | ( | ) | id |
and to eliminate ambiguity we must establish precedence relations between certain pairs of terminals. On this Web page precedence relations are denoted as follows:
Relation | Meaning |
---|---|
a << b | terminal a yields precedence to terminal b |
a == b | terminal a has the same precedence as terminal b |
a >> b | terminal a takes precedence over terminal b |
Note that precedence relations are only established between the terminals of the grammar (and with the $ markers at both ends of a string,) nonterminals are ignored. The customary precedence relations for the terminals of the foregoing grammar are shown in the following table (note that - is the unary minus operator, this grammar doesn't have a binary subtract operator:)
id | - | ( | * | + | ) | $ | |
---|---|---|---|---|---|---|---|
id | err | err | err | >> | >> | >> | >> |
) | err | err | err | >> | >> | >> | >> |
- | << | << | << | >> | >> | >> | >> |
* | << | << | << | >> | >> | >> | >> |
+ | << | << | << | << | >> | >> | >> |
( | << | << | << | << | << | == | err |
$ | << | << | << | << | << | err | acc |
Note that err entries in this table mark syntax errors and the acc entry marks the accept state when successful completion of parsing can be announced.
Since no production of the grammar has two or more adjacent nonterminals on its right-side and there are no -productions, there must always be one or more terminals between any pair of nonterminals in any sentential form. The precedence relation between two terminals holds whether or not there is a nonterminal between them. For example, in the sentential form: $ E * ( E + E ) $ there are the following precedence relations:
$ | << | * | << | ( | << | + | >> | ) | >> | $ |
---|
The shift and reduce actions of the operator-precedence parser are governed by the precedence relations between the right-most terminal on the stack, a, and the current input symbol, b:
Developing the Operator-Precedence Relations
Here we describe how the operator-precedence relations are developed.
Binary Operators: Let 1 and 2 be two infix binary operators:
Leaves of Syntax Trees: The leaves of a syntax tree are ID tokens, numeric constants, and boolean constants. Leaves must be evaluated before their values can be used by the operators so they are given higher precedence than the operators.
Parentheses: For the parentheses we must have ( == ) so the handle of the E --> (E ) production can be found. A pair of parentheses can't be removed until all operations between the pair have been performed. A pair of outer parentheses can't be removed until all pairs of inner parentheses have been removed. An operation outside a pair of parentheses can't be performed until the pair has been removed. These rules dictate the operator-precedence relations in the following table (is any operator:)
( | ) | ||
---|---|---|---|
( | << | << | == |
<< | >> | ||
) | err | >> | >> |
Note that it's usually a syntax error to have a ) ( combination with no intervening operator. The relations in this table can also be used for brackets and any other grouping operators.
Function Calls: In most languages the production for a function call is:
where expression_list is one or more expressions separated by commas. The handle for this production includes zero or more commas between the parentheses so the relations in the following table are required:
id | ( | , | ) | |
---|---|---|---|---|
id | err | == | >> | >> |
( | << | << | == | == |
, | << | << | == | == |
) | err | err | >> | >> |
Reading Array Elements: In many languages, the production for reading an element of an array is:
The precedence relations table for function calls can be used if its parentheses are replaced by brackets.
End Markers: The $-sign at the left-end of the stack is never in any handle so it should be << all following terminals. Similarly, the $-sign at the end of the input is never shifted on to the stack so all preceding terminals should be >> than it.
Precedence Functions
Suppose the grammar for an operator-precedence parser has n terminals so the table of operator-precedence relations has n + 1 rows and n + 1 columns (including the $). Often n is so large that coding and de-bugging the table of (n + 1)2 entries is a very difficult task. Fortunately, one can usually replace the table with a pair of precedence functions that are much easier to code and de-bug.
The idea is to try and define two functions, f and g, that map the terminals into integers such that:
Functions f and g are called precedence functions. Each function has only one argument with only n + 1 values so the two functions are much easier to code and de-bug than the large precedence table.
Note that there is some loss in error-detection capability when the precedence functions are used: they never notice any syntax errors whereas the table of precedence relations does report some of the errors. A syntax error won't be discovered until the parser tries to reduce a handle that doesn't match the right-side of any production.
Precedence functions f and g do exist for the precedence-relations table near the beginning of this section of the notes. As an example, the table is repeated below with values of the precedence functions in the f -column and the g-row shown next to the corresponding terminals:
g | 6 | 6 | 6 | 4 | 2 | 1 | 0 | |
---|---|---|---|---|---|---|---|---|
f | id | - | ( | * | + | ) | $ | |
5 | id | err | err | err | >> | >> | >> | >> |
5 | ) | err | err | err | >> | >> | >> | >> |
5 | - | << | << | << | >> | >> | >> | >> |
5 | * | << | << | << | >> | >> | >> | >> |
3 | + | << | << | << | << | >> | >> | >> |
1 | ( | << | << | << | << | << | == | err |
1 | $ | << | << | << | << | << | err | acc |
For every <<, ==, and >> entry in the table one can see that the comparison of the corresponding f and g function values matches the entry. If the parser uses the precedence functions instead of the table it must have a separate test for the accept state and it won't notice a syntax error until it tries to reduce an illegal handle.
[5.3] - Bottom-Up Evaluation Of S-Attributed Definitions
Section 4.5 describes a shift-reduce parser: the parser shifts input symbols on to a stack until it finds a handle on the top of the stack which it then reduces by popping off the handle and pushing the left-side of the appropriate production on to the stack. As an example, assume that:
is a production in the grammar and the stack contains:
Z | <-- top |
---|---|
Y | |
X | |
. . . | |
$ | <-- bottom |
If the shift-reduce parser decides that XYZ is indeed a handle then it pops off Z, Y, and X, and pushes A on to the stack:
A | <-- top |
---|---|
. . . | |
$ | <-- bottom |
In an S-attributed definition all attributes are synthesized. An attribute associated with a grammar symbol should remain associated with that grammar symbol when the symbol is stacked so we give each item on the stack two fields: one field holding a grammar symbol and the other field holding the synthesized attributes of that grammar symbol (or a pointer to them.)
Every time the shift-reduce parser performs a reduction it reads the attributes of the symbols popped off the stack, computes the attributes associated with the nonterminal on the left-side of the production, and places them in the attribute field of the item pushed on to the stack.
To continue the foregoing example, assume nonterminals A, X, Y, and Z, are associated with synthesized attributes A.a, X.x, Y.y, and Z.z, respectively, and assume there is a semantic rule associated with the production as follows:
Before the reduction the stack contains:
Z | Z.z | <-- top |
---|---|---|
Y | Y.y | |
X | X.x | |
. . . | . . . | |
$ | . . . | <-- bottom |
When it reduces XYZ to A the parser reads Z.z, Y.y, and X.x from the items popped off the stack, computes A.a = f (X.x, Y.y, Z.z ) and pushes one item on the stack containing A and A.a :
A | A.a | <-- top |
---|---|---|
. . . | . . . | |
$ | . . . | <-- bottom |
Operator-Precedence Parsing
An operator-precedence parser finds handles by evaluating the precedence relations between terminals on the stack. Evaluation of these relations is complicated by the fact that the parser must skip over any nonterminals it finds on the stack. Since every stacked nonterminal must be above a terminal (or the $-item marking the bottom of the stack) one can simplify precedence-relation evaluation by placing each nonterminal with the terminal below it. Each stacked item has four fields:
As an example, assume a grammar for expressions where mulop-operators have precedence over addop-operators, nonterminal E has an associated synthesized attribute, E.syn, and every terminal has an attribute (.attr) attached to it. Assume the grammar has the following production and semantic rule:
Assume the parser stack contains:
If each stacked item contains field-1, field-2, and field-4 then the stack looks like:
field-1 | field-2 | field-4 | |
---|---|---|---|
mulop | mulop.attr | E2.syn | <-- top |
addop | addop.attr | E1.syn | |
. . . | . . . | . . . | |
$ | . . . | . . . | <-- bottom |
Assume the next input symbol is addop or any other terminal with lower precedence than mulop. The parser can easily identify the handle (E mulop E ) by examining the precedence relations between the terminals in field-1 of the items on the top of the stack and between the topmost terminal and the input symbol. After it pops off the handle and evaluates E.syn = f ( E1.syn, mulop.attr, E2.syn ), the parser places E.syn in field-4 of the addop terminal:
field-1 | field-2 | field-4 | |
---|---|---|---|
addop | addop.attr | E.syn | <-- top |
. . . | . . . | . . . | |
$ | . . . | . . . | <-- bottom |
An S-Attributed Definition for Project 3
Here we show the semantic rules that should be associated with the productions in the grammar of coding project 3.
There is a single nonterminal, expr, in the grammar with a single attribute, expr.ptr, that points to the symbol table entry containing the lexeme and type-expression of the expr. The productions of the grammar are shown in the table below:
expr --> | NUM | {1} |
expr --> | BCONST | {2} |
expr --> | LPAR expr1 RPAR | {3} |
expr --> | NOTOP expr1 | {4} |
expr --> | UNARYOP expr1 | {5} |
expr --> | ID LBRK expr1 RBRK | {6} |
expr --> | expr1 ANDOP expr2 | {7} |
expr --> | expr1 OROP expr2 | {8} |
expr --> | expr1 ADDOP expr2 | {9} |
expr --> | expr1 RELOP expr2 | {10} |
expr --> | ID | {11} |
expr --> | ID LPAR expr_list RPAR | {12} |
expr --> | expr1 MULOP expr2 | {13} |
Every appearance of the expr nonterminal on the right-side of a production is subscripted to distinguish it from the expr nonterminal on the left-side. The expr_list nonterminal in the right-side of production {12} is a list of one or more expr nonterminals separated by COMMA tokens. We assume that type-expressions use the format suggested here. The format of quadruples is shown here. The following items are labeled with numbers in braces that correspond to the numbered productions in the foregoing table: each item describes the semantic rules/actions that should be performed when the parser reduces by the corresponding production.
{1} | Copy the pointer to the NUM-token entry into expr.ptr. |
---|
{2} | Copy the pointer to the BCONST-token entry into expr.ptr. |
---|
{3} | Copy expr1.ptr into expr.ptr. |
---|
{4} |
Check that GetType(expr1.ptr ) equals "b".
Call newtemp for the name of a new temporary variable, newname. Insert a new entry into the symbol table with newname as the lexeme and "b" as the type-expression. Generate a NOT quadruple with GetLex(expr1.ptr ) as the source operand and newname as the result. Copy the pointer to the new symbol table entry into expr.ptr. |
---|
{5} |
Check that GetType(expr1.ptr ) equals "i"
or "r". If the lexeme of the UNARYOP token is the plus-sign then do
nothing except copy expr1.ptr into expr.ptr.
Otherwise (if the lexeme of the UNARYOP token is the minus-sign) call newtemp for the name of a new temporary variable, newname. Insert a new entry into the symbol table with newname as the lexeme and GetType(expr1.ptr ) as the type-expression. If the type-expression is "i" then generate a SUBI quadruple with "0" as the first source operand, GetLex(expr1.ptr ) as the second source operand, and newname as the result. If the type-expression is "r" then generate a SUBR quadruple with "0.0" as the first source operand, GetLex(expr1.ptr ) as the second source operand, and newname as the result. Finally, copy the pointer to the new symbol table entry into expr.ptr. |
---|
{6} |
Check that GetType(expr1.ptr ) equals "i".
Check that the type-expression of the ID-token is "B", "I", or "R".
Call newtemp for the name of a new temporary variable, name1. Insert a new entry into the symbol table with name1 as the lexeme and "i" as the type-expression. Generate a SUBI quadruple with GetLex(expr1.ptr ) as the first source field, the lexeme of the low-index of the array as the second source field, and name1 as the result. Call newtemp for the name of another new temporary variable, name2. Insert a new entry into the symbol table with name2 as the lexeme and "b", "i", or "r" as the type-expression. Generate an LDB, an LDI, or an LDR quadruple with name1 as the first source operand, the lexeme of the ID-token as the second source operand, and name2 as the result. Copy the pointer to the symbol table entry of name2 into expr.ptr. |
---|
{7} |
Check that GetType(expr1.ptr ) and
GetType(expr2.ptr )
equal "b".
Call newtemp for the name of a new temporary variable, newname. Insert a new entry into the symbol table with newname as the lexeme and "b" as the type-expression. Generate an AND quadruple with GetLex(expr1.ptr ) as the first source operand, GetLex(expr2.ptr ) as the second source operand, and newname as the result. Copy the pointer to the new symbol table entry into expr.ptr. |
---|
{8} | Same as {7} except generate an OR quadruple instead of an AND quadruple. |
---|
{9} | Copy expr1.ptr and
expr2.ptr into temporary pointer variables, ptr1
and ptr2, respectively. Check that GetType(ptr1 ) equals
"i" or "r".
Check that GetType(ptr2 ) equals "i" or "r".
If GetType(ptr1 ) equals "i" and GetType(ptr2 ) equals "r" then the first source operand must be converted to a real. Do this by calling newtemp for the name of a new temporary variable, new1 ; inserting a new entry into the symbol table with new1 as the lexeme and "r" as the type-expression; generating a COPYI2R quadruple with GetLex(ptr1 ) as the source operand and new1 as the result; and then changing ptr1 to point to the new1 entry of the symbol table. If GetType(ptr1 ) equals "r" and GetType(ptr2 ) equals "i" then the second source operand must be converted to a real. Do this by calling newtemp for the name of a new temporary variable, new2 ; inserting a new entry into the symbol table with new2 as the lexeme and "r" as the type-expression; generating a COPYI2R quadruple with GetLex(ptr2 ) as the source operand and new2 as the result; and then changing ptr2 to point to the new2 entry of the symbol table. Both source operands are now of the same type so call newtemp for the name of a new temporary variable, newname. Insert a new entry into the symbol table with newname as the lexeme and GetType(ptr1 ) as the type-expression. Depending on the lexeme of the ADDOP token and GetType(ptr1 ) generate an ADDI, SUBI, ADDR, or SUBR quadruple with GetLex(ptr1 ) as the first source operand, GetLex(ptr2 ) as the second source operand, and newname as the result. Copy the pointer to the newname -entry of the symbol table into expr.ptr. |
---|
{10} | Copy expr1.ptr
and expr2.ptr into temporary pointer variables, ptr1
and ptr2, respectively. Check that GetType(ptr1 ) equals
"i" or "r".
Check that GetType(ptr2 ) equals "i" or "r".
If GetType(ptr1 ) equals "i" and GetType(ptr2 ) equals "r" then the first source operand must be converted to a real. Do this by calling newtemp for the name of a new temporary variable, new1 ; inserting a new entry into the symbol table with new1 as the lexeme and "r" as the type-expression; generating a COPYI2R quadruple with GetLex(ptr1 ) as the source operand and new1 as the result; and then changing ptr1 to point to the new1 entry of the symbol table. If GetType(ptr1 ) equals "r" and GetType(ptr2 ) equals "i" then the second source operand must be converted to a real. Do this by calling newtemp for the name of a new temporary variable, new2 ; inserting a new entry into the symbol table with new2 as the lexeme and "r" as the type-expression; generating a COPYI2R quadruple with GetLex(ptr2 ) as the source operand and new2 as the result; and then changing ptr2 to point to the new2 entry of the symbol table. Both source operands are now of the same type so call newtemp for the name of a new temporary variable, newname. Insert a new entry into the symbol table with newname as the lexeme and "b" as the type-expression. Copy the lexeme of the RELOP token into a temporary string variable, lex. If lex equals ">" or ">=" then exchange ptr1 with ptr2 and change lex to "<" or "<=", respectively. Depending on lex and on GetType(ptr1 ) generate an LTI, LTEQI, EQI, NEQI, LTR, LTEQR, EQR, or NEQR quadruple with GetLex(ptr1 ) as the first source operand, GetLex(ptr2 ) as the second source operand, and newname as the result. Copy the pointer to the newname -entry of the symbol table into expr.ptr. |
---|
{11} | Check that the type-expression of the
ID token equals either: "b", "i", "r", ">b",
">i", or ">r".
If the type-expression of the ID token equals either: "b", "i", or "r", then copy the pointer to the ID-token entry into expr.ptr. If the type-expression of the ID token equals either: ">b", ">i", or ">r", then call newtemp for the name of a new temporary variable, newname. Insert a new entry into the symbol table with newname as the lexeme and "b", "i", or "r", as the type-expression, respectively. Generate a CALLB, CALLI, or CALLR quadruple with "0" in the first source field, the lexeme of the ID token in the second source field, and newname in the result field. Copy the pointer to the newname -entry of the symbol table into expr.ptr. |
---|
{12} | Between the LPAR and RPAR tokens there
should be a list of one or expr nonterminals separated by COMMA tokens.
Trace through the expression list from left-to-right and generate a PARAMB, PARAMI, or PARAMR quadruple for each expression in the list. If expri.ptr is the attribute for expression i in the list then use GetType(expri.ptr ) to select the quadruple operation and put GetLex(expri.ptr ) into the first source field of the quadruple. While tracing through the expression list count the number of expressions in the list and concatenate their type-expressions into a single string. Check the type-expression for the ID-token that it does contain a '>' character followed by 'b', 'i', or 'r', and that the string before the '>' character agrees with the string for the expression list. Let last be the last character in the type-expression for the ID-token. Call newtemp for the name of a new temporary variable, newname. Insert a new entry into the symbol table with newname as the lexeme and last as the type-expression. Depending on last generate a CALLB, CALLI, or CALLR quadruple with the number of expressions in the expression list in the first source field, the lexeme of the ID-token in the second source field, and newname as the result. Copy the pointer to the newname -entry of the symbol table into expr.ptr. |
---|
{13} | The four lexemes for the mulop
token ("*", "/", "div", and "mod") have different
type-checking rules.
The "*" lexeme has the same type-checking rules as the addop token so use the rules in {9} except generate a MULI or MULR quadruple. The "/" lexeme is always a real division. If GetType(expr1.ptr ) equals "i" then generate a COPYI2R quadruple to convert expr1 to a real and store it in a new temporary variable. If GetType(expr2.ptr ) equals "i" then generate a COPYI2R quadruple to convert expr2 to a real and store it in a new temporary variable. Then generate a DIVR quadruple and copy the pointer to its real result into expr.ptr. The "div" and "mod" lexemes are always integer operations and won't accept real source operands. Check to make sure that GetType(expr1.ptr ) and GetType(expr2.ptr ) equal "i". Then generate a DIVI or MOD quadruple and copy the pointer to its integer result into expr.ptr. |
---|
[4.7] - LR Parsers
A large class of grammars can be parsed using LR(k ) parsers: the "L" stands for left-to-right scanning of the input, the "R" stands for constructing a rightmost derivation in reverse, and k is the number of input symbols of lookahead used to making parsing decisions. When (k ) is omitted, k is assumed to equal 1. LR parsing has several advantages:
The main disadvantage of LR parsing is that it's too much work to construct a parser by hand: one needs a specialized tool - an LR parser generator.
The LR Parsing Algorithm
Figure 4.29 shows a block diagram of an LR parser: an input, an output, a stack, a driver program, and a parsing table with two parts (action and goto ). The driver program is the same for all LR parsers: it reads the input string one symbol at a time and maintains a stack of the form:
where each Xi is a grammar symbol, each si is a state, and sm is on the top of the stack. The action of the driver program depends on action [ sm, ai ] where ai is the current input symbol:
(1) | E --> E + T |
(2) | E --> T |
(3) | T --> T * F |
(4) | T --> F |
(5) | F --> ( E ) |
(6) | F --> id |
State | action | goto | |||||||
---|---|---|---|---|---|---|---|---|---|
id | + | * | ( | ) | $ | E | T | F | |
0 | s5 | s4 | 1 | 2 | 3 | ||||
1 | s6 | acc | |||||||
2 | r2 | s7 | r2 | r2 | |||||
3 | r4 | r4 | r4 | r4 | |||||
4 | s5 | s4 | 8 | 2 | 3 | ||||
5 | r6 | r6 | r6 | r6 | |||||
6 | s5 | s4 | 9 | 3 | |||||
7 | s5 | s4 | 10 | ||||||
8 | s6 | s11 | |||||||
9 | r1 | s7 | r1 | r1 | |||||
10 | r3 | r3 | r3 | r3 | |||||
11 | r5 | r5 | r5 | r5 | |||||
where si means shift and stack state
i , rj means reduce by production numbered j , acc means accept, and blank means error. |
LR Grammars
An LR grammar is a grammar for which one can construct a parsing table. A grammar is LR if a left-to-right shift-reduce parser can recognize handles when they appear on the top of the stack.
An LR parser doesn't need to examine the entire stack for a handle, the state symbol on the top of the stack contains all the information it needs. It can also examine the next k input symbols to help make a decision. The cases k = 0 and k = 1 are of practical interest so we only consider those cases here.
Constructing SLR Parsing Tables
The text shows three methods of constructing LR parsing tables: simple LR or SLR is the simplest method but only works for the simplest of grammars.
An item of a grammar G is a production of G with a dot at some position of the right side; e.g., if T --> X Y Z is a production of G then four items of G are:
If G has an -production, T --> , then T --> . is an item of G. Intuitively, an item indicates how much of a production has been so far in a parsing process; e.g., T --> X . Y Z indicates that a string derivable from X has been seen so far on the input and we hope to see a string derivable from Y Z next on the input.
If G is a grammar with start symbol S, then G ', the augmented grammar of G, is G with a new start symbol S ' and production S ' --> S. The purpose of augmenting a grammar is to indicate to the parser when it has reached the accept state: the accept state occurs when the parser tries to reduce by S ' --> S.
The Closure Operation
If I is a set of items for a grammar then closure (I ) is the set of items constructed from I by the following two rules:
E ' --> E |
E --> E + T |
E --> T |
T --> T * F |
T --> F |
F --> ( E ) |
F --> id |
If set I = { E ' --> . E } then closure (I ) contains the following seven items:
E ' --> . E |
E --> . E + T |
E --> . T |
T --> . T * F |
T --> . F |
F --> . ( E ) |
F --> . id |
The Goto Operation
If I is a set of items and X is a grammar symbol then goto (I, X ) is the closure of the set of all items [T --> X . ] such that [T --> . X ] is in I.
Example 4.35: If I contains the following two items:
E ' --> E . |
E --> E . + T |
then goto (I, + ) contains the following five items:
E --> E + . T |
T --> . T * F |
T --> . F |
F --> . ( E ) |
F --> . id |
The Sets-of-Items Construction
A canonical collection of sets of items for an augmented grammar, C, is constructed as follows:
SLR Parsing Tables
Given an augmented grammar, G ', construct the canonical collection, C, and the function FOLLOW(T ) for every nonterminal T in the grammar. Then for every set, Ii , in C construct state i in the parsing table as follows:
If any conflicting actions are generated by the above rules the grammar is not SLR(1). The grammar might be ambiguous or a more complex method such as Canonical LR or LALR must be used: section 4.7 of the text describes these other methods.
Example 4.38: The parsing tables for the expression grammar of example 4.33 can be constructed using the foregoing rules. The result is shown in figure 4.31.
A Comparison of Predictive Parsers with Shift-Reduce Parsers
Nonrecursive predictive parsers are discussed in section 4.4 and shift-reduce parsers are discussed in section 4.5. Both parsers read the input from left-to-right and maintain a stack of grammar symbols but their parsing operations are decidely different as shown in the following table:
Predictive Parser | Shift-Reduce Parser |
---|---|
Top-down (LL) Parser | Bottom-up (LR) Parser |
Stack predicts what is to come | Stack shows what has been seen so far |
The stack initially contains the start-symbol of the grammar. | The stack is initially empty. |
The stack is empty when the accept state is reached. | The stack contains the start symbol of the grammar when the accept state is reached. |
Input tokens are popped off the stack. | Input tokens are pushed on the stack. |
Left sides of productions are popped off the stack. | Right sides of productions are popped off the stack. |
Right sides of productions are pushed on the stack. | Left sides of productions are pushed on the stack. |