CS 23001 Computer Science II: Data Structures & Abstraction
Project #4


Objectives:
Problem:
Build a program profiler. Construct a program to instrument C++ source code to support program profiling.

It is often important to determine how many times a function or statement is executed. This is useful not only for debugging but for determining what parts of a program may need to be optimized. This process is called profiling. That is, a execution profile presents how many times each part of a program is executed using a given set of input data (or for some run time scenario). To compute a profile, specialized statements need to be inserted in to the code that keep track of how many times a function or statement is executed. The process of inserting these statements is called instrumenting the code.

To implement a profiler one must first parse the source code and generate an Abstract Syntax Tree (AST) of the code. Each node of the AST describes the syntactic category of the code stored within it (function, statement, while-statement, etc.). So at the top level is a syntactic category corresponding to a program, class, or function (such as in the case of a main). Under that are sub-trees that further detail the syntactic categories of each part of the code. Such things as declarations, parameter lists, while-statement, and expression statements will describe the various parts of the program.

After the AST is generated it can then be traversed and the appropriate syntactic structures can be found that need to be instrumented. Once a construct is found, say a function, new code can be inserted that keeps track of how many times that function is executed.

The most difficult part of constructing a profiler is correctly parsing the source code. Unfortunately, C++ is notoriously difficult to parse. So here we will use a parsing tool called src2srcml. This tool reads in C++ code and marks up the code with XML tags (e.g., block, if, while, condition, name, etc). That is, the output is an AST in XML. The XML representation is called srcML (source code markup language).

Given the code example:
    void foo(int x) {
        ++x;
    }
The following is srcML with tags. It is indented for readability and to see the tree structurue. funciton has 4 children, the funciton type, the name of the function, the parameter_list, and the block. Each <tag> has an end-tag </tag> as in html.
<function>
    <type><name>void</name></type>
    <name>foo</name>
    <parameter_list>(
        <parameter><decl><type><name>int</name></type> <name>x</name></decl></parameter>
    )</parameter_list>
    <block>{
        <expr_stmt><expr><operator>++</operator><name>x</name></expr>;</expr_stmt>
    }</block>
</function>
A number of srcML data files are provided for the project. However, you can use your own program as input. To run srcML on wasp or hornet you will first need to set a PATH variable so the command can be found. You need to execute the command:
export PATH=/local/opt/srcml/bin:$PATH

It is best if you insert this line into your .bash_profile file in your home directory on wasp/hornet.

Then to generate the srcML file for your own code use the following:
src2srcml main.cpp -o main.cpp.xml

Use the following for a list of all options:
src2srcml --help

More information about srcML can be found at www.srcML.org including a list of all the tag names (see Documentation). You can also download srcML if you want it on your own machine.

Tag names needed for the project: Your program will read in srcML files (in XML) and build an internal representation of the AST as a tree data structure. When using XML this internal representation is typically called a Document Object Model (DOM). A set of classes defining the data structure (DOM) is in shared (ASTree.hpp and ASTree.cpp). This part of the project is partially implemented (i.e., the read and write work). The given code compiles and runs. The program currently is able to read in a srcML file and construct the internal tree representation. It can also write the code back out (minus the XML) from the internal tree representation. Your job will be to complete the implementation and build the instrumenter for the profiler.

The specification for the profiler is as follows: In short, your profiler program will read in source code and write out source code that has been instrumented. Then you will compile and run the instrumented version of the code to get the profile information. Below is a full example of the tool chain. First convert main.cpp into srcML. Then run the profiler on that XML file. This will produce a instrumented version of main.cpp called p-main.cpp. Finally, you compile and run this program.

src2srcml main.cpp -o main.cpp.xml
./profiler main.cpp.xml
clang++ -Wall p-main.cpp -c
clang++ -Wall profile.cpp -c
clang++ -Wall p-main.o profile.o -o p-main
./p-main


Requirements:

URL: http://www.cs.kent.edu/~jmaletic/CS23001/programs/proj4.html
Last update: Mon Nov 18 12:18:01 2019 EST