CS 4/53111 - Project 1
Symbol Table Package & Lexical Analyzer
Due Thursday, February 16, 2006
Design and code a symbol table package, a lexical analyzer, and a main
program to test the project. Projects 2 and 3 will use the symbol table
package and lexical analyzer that you code in this project.
Symbol Table Package: The structure of the symbol table should
follow the suggestions in section [7.6] of the course
notes. As a minimum the structure of each entry in the symbol table should
contain the following fields:
- A field to hold the lexeme (or a pointer to the lexeme) - a string of
16 characters will be large enough for every lexeme in the test files.
- A field to hold the token-type corresponding to the lexeme.
- An integer field to hold the scope of the entry.
- An integer field to hold an entry-number.
- A field to hold a type-expression (or a pointer to a type-expression) -
a string of 12 characters will be large enough for every type-expression
in the test files.
- A pointer to the next entry in the same linked list.
To prepare for projects 2 and 3 you can also add two more fields, Low and High,
to the structure of each table entry - these fields will hold pointers to
other entries in the symbol table.
The symbol table package should include the following functions:
- FIND accepts a character-string for an argument and looks for a symbol
table entry with a non-negative scope-field whose lexeme agrees with the argument.
If an entry is found then FIND returns a pointer to that entry so the caller
can read and/or modify other fields in that entry. If no entry is found then
FIND returns a null-pointer.
- INSERT accepts a character-string for an argument and a token-type as
a second argument. It inserts a new entry into the symbol table with the
first argument as its lexeme; the second argument as its token-type; and a scope-field
set to the value of a global integer variable, current_scope. INSERT should
also fill in the entry-number field of the new entry with a unique integer: 1 for
the first entry, 2 for the second entry, 3 for the third entry, etc., etc. INSERT
returns a pointer to the new entry so the caller can modify other fields in that
entry. INSERT returns a null-pointer only if the symbol table is full.
Both FIND and INSERT hash their arguments to select one of the linked-lists in
the symbol table. INSERT should always insert the new entry at the start of the
selected list so if the same lexeme has multiple entries, FIND will pick the entry
that was inserted last.
Lexical Analyzer: Write the lexical analyzer as a function with no arguments
and no returned value. Each time the function is called it stores the token-type of
the next input token into a global-variable, lookahead, and a pointer to a
symbol table entry into another global variable, attributes.
Comments are delimited by braces, { and }, and should be treated as
white-space. Spaces, tabs, and newlines are also white-space.
Testing: Test the code with a main program that writes a text file
containing all tokens found in a Pascal source input file. Your output file will
be checked electronically so be careful how you format it.
Output a line of text for each token found in the source file. Each output line
has at least three fields with spaces and/or tabs between the fields:
- the token-type (spelled as shown in this list;)
- the lexeme;
- the entry-number of the symbol table entry.
Other information can be added to an output line as long as it is separated from the
entry-number by spaces and/or tabs.
You can de-bug your project using test1in as a source file:
the output file for this particular input file should look like test1out. Follow these
guidelines to submit your project.
Click here for some hints you can use.
Click here for a list of all token-types and
their corresponding lexemes.
Kenneth E. Batcher - 1/3/2006