CS 46101/56101 - Fall 2002
Design and Analysis of Algorithms
Project: Comparison-Based Sorting Project
Due Date: November 26, 2002
No projects will be accepted after the due date--no exceptions!!!
Objective:
When asked as a consultant to industry to comment on the use of sorting algorithms Professor Speedy made the following statement
All comparison-based sorting algorithms are lower bound by Theta(n log n) running times so it would be best and simplest to always use quick-sort, which has an expected running time of Big-O(n log n).
You are to assume you are an employee of the company Dr. Speedy if consulting to. Your boss has given you a some data sets and asked you to prepare a report that either supports or refutes Professor Speedy's statement.
Implementation Issues:
For this project you will implement the following algorithms:
- quicksort
- insertion-sort or selection-sort
- heap-sort or merge-sort
In order for this to be a fair comparison, you must make each algorithm as fast as you can. The following is a minimal list of issues that you should address when developming your code:
- Put counters into each algorithm to count the number of times the basic sorting operation of comparisions between elements is performed.
- Record the "Wall Time" for each run by calling the system clock when your algorithm starts and again when your algorithm finished and computing the differenc.
- You will be required to turn in hard copies of your source code and electronic copies of your executables for each of your algorithms (see below). You must include instructions for running your executables including the name and version of the operating systems you ran your experiments on.
There are three possible ways to submit your electronis copies:
- Burn a CD, zip-disk, or 3 1/4" floppy with all of your files on it and turn this in with your project write-up. Include a hard copy of your runtime instructions.
- Send an email with your files as attachments and indicate your runntime instructions in the email
- Deposit copies of your files into the following directory on trident, "/home/csfaculty/volkert/public_html/cs46101/studentprojects/", and send an email indicating that you have done this along with your runtime instructions. Be sure you have named you files so that I can easily associate them with you.
As you write your programs be sure to document in your report the steps you have taken to make the implementations as fast and fair as you can.
Test Data:
For this project you will use "real" data sets, some of which will be very very large. Your experiments must be run on every data set provided. The internet is a useful source of ``random'' data which you should use to test you implementations. To get you started download some large files (text or images) from the internet and treat every four bytes as a 32-bit integer. Alternatively, you may use the binary code for an application (e.g., Microsoft Word) which can also be treated as a sequence of integers.
You are to run each of your sorting algorithms on all of the data sets provided below. There are a total of 10 data sets named datafilex, where the x is a number ranging from 1 to 10. Each file contains a set of 32-bit unsigned integers each seperated by a space. The files may be obtained from the links below. A description of the composition of these data sets follows each link.
- datafile1: file contains 6,250 long unsigned integers generated randonly.
- datafile2: file contains 125,000 long unsigned integers generated randonly.
- datafile3: file contains 250,000 long unsigned integers generated randonly.
- datafile4: file contains 500,000 long unsigned integers generated randonly.
- datafile5: file contains 1,000,000 long unsigned integers generated randomly.
- datafile6: file contains 250,000 long unsigned integers generated randomly.
- datafile7: file contains 250,000 long unsigned integers partially sorted.
- datafile8: file contains 250,000 long unsigned integers in asending order.
- datafile9: file contains 250,000 long unsigned integers in desending order.
- datafile10: file contains 250,000 long unsigned identical integers .
These data sets are be grouped into two "experiments" as follows:
Experiment I
Using the captured information from each of your sorting algorithms when processing each of the first five data sets (datafile1-datafile5) produce graphs showing how the captured information changes as the size of the data set changes.
Experiment II
Using the captured information from each of your sorting algorithms when processing each of the second five data sets (datafile6-datafile10) produce graphs showing how the captured information changes as the context of the data set changes.
Reporting Requirements:
You should report your results as a technical report of roughly 5 to 10 pages. The report will be graded on its quality, not its length. A good report should present your case clearly and convincingly. The English will be judged according to the standards of a term paper submitted in a liberal arts course. Grammar counts.
The report should contain the following parts:
- Abstract: An abstract is a short description of the contents of the report. The abstract should be written in the third person and should not be longer than half a page.
- Introduction: Describe the project, your approach and summarize the conclusions. A person who understands computer science but has not read this project description should understand this section.
- Implementation: Document how you implemented the algorithms. Report how the implementation issues described above were addressed.
- Experiments: Describe the experiments you performed, how the running times were collected, etc. Report the timing results of your experiments in tables and graphs. The purpose of this section is to give enough information for the reader to repeat your experiments.
- Conclusion: State and justify your conclusions. Between the algorithms you tested, which algorithm is faster? Do you think your conclusions are valid in general or just for the data and systems that you used?
Grading:
Your project will be graded on 4 parts weighted equally.
- Report: The report will be graded according to presentation, as described above.
- Implementation: This portion of the project grade depends on how well you implemented each algorithm. See the implementation issues discussed above.
- Correctness: The implementation, reporting and experimental methodology should conform to this project description.
- Conclusion: You will not be graded on the statement of your conclusion, but on how convincing your conclusions are.
Note that coding is only one part of this project. You will not do well if you simply turned in working programs. You must leave sufficient time after you code to test the code, run your experiments and write up the report.