Mondays and Wednesdays 9:15-10:30 am; Rm. MSB 228
Office Hours: Mondays and Wednesdays 11:00-12:00pm
and by appointment
---------------------------------------------------------
The course presents the concepts and techniques of data mining. Data Mining is a process of discovering information from a set of large data sets. Many commercial and government organizations have huge databases and files with a lot of information in them. Data Mining has developed a set of techniques to unlock information from these data. Among data mining successes are: discovering patterns of traveler behaviour, discovery of market associations of the "beer and diaper" type, and comparisons of the genotype of people with and without certain medical problems related to presence of specific genes in their genetic structure. Data Mining is an interdisciplinary field that combines methods from statistic, databases, machine learning and neural networks. All necessary information from these fields will be given in class. The major difference between data mining and previous artificial intelligence and statistical methods is in designing scalable methods that applicable to large data that cannot be stored entirely in computer memory. Such methods led to very impotant applications in bioinformatics, medical informatics, market analysis, financial engineering, web searching, and e-commerce e-science among others. In this course we first focus on issues of data extraction and data preparation for data mining. We then analyse basic data mining techniques: association rules, classification, clustering, and mining complex data types. Finally, we apply the learned techniques to specific applications in medicine and market analysis.
It is expected that at the end of the course students will learn basic data mining techniques and examples of application of these techniques to specific application data. The students will also learn the ways to collect data from data warehouses and design data models amenable to scalable data mining techniques.
Weekly Course Outline
| Weeks | Topics | Reading Material |
|---|---|---|
| 1 | Introduction to Data Mining | Ch 1 |
| 2 | Topics Related to Data Mining | Ch 2 |
| 3 | Data warehousing and Data Preprocessing | Ch 2 of additional textbook |
| 4 | Data Preparation and Discretization | Ch 3 of additional textbook |
| 5 | Introduction to Data Mining Techniques | Ch 3 |
| 6 | Characterization and Comparison | Ch 5 of additional textbook |
| 7-9 | Classification Algorithms | Ch4 |
| 10-11 | Association Rules | Ch 6 |
| 12-13 | Clustering | Ch 5 |
Exams
Students will be asked to do some homeworks. Each homework contains examples of application of data mining algorithms as well as questions about the alternative algorithms for some data mining algorithms. There will be a midterm on October 14, 2009. Project due date is December 7, 2009. This a firm date and no extension will be granted. There will be final exam on December 15th, 2009 at 10:15. Each project will be either implementation of comparisons between two data mining algorithms discussed in class. or analysis of additional literature assigned by instructor.
Requirements & Grading Policy
A student's grade is determined as a weighted average of homeworks (20%), project (20%), midterm (25%), and final exam (35%).
The official registration deadline for this course is 09/13/2009. University policy requires all students to be officially registered for each class they are attending. Students who are not officially registered for a course by published deadlines should not be attending classes and will not receive credit or a grade for the course. Each student must confirm enrollment by checking her/his class schedule (using Student Tools in FlashFast) prior to the deadline indicated. Registration errors must be corrected prior to the deadline. The last day to withdraw is 11/08/2009.