Kent State University 
CS 6/79995 St:Graph Mining
Spring 2007

Instructor: Ruoming Jin

Tuesdays and Thursdays 11:00AM-12:15PM; Rm. MSB 276
Office Hours: Mondays and Wednesdays 10:00-11:00AM

---------------------------------------------------------

Prerequisites

CS 4/56101 Algorithms
CS 6/73105 Data Mining Techniques
Or Consent of the Instructor

Course Overview

The existing database and data mining mainly deal with relational and/or semi-structured data. The recent advance in science and technology has led to a new type of data – graph data – being collected with an unprecedented rate in many fields of human endeavor. Driven by the economic and scientific need, storing, querying, and mining such data are emerging as a new research area.

 

In the graph model, the vertices represent different entities and the edges capture their pair-wise relationships.  For instance, the entire WWW can be treated as a massive direct graph where the nodes represent the web pages. In system biology, various kinds of massive networks, that may easily contain thousands and thousands of genes or gene products, are being constructed for many different species.  The graph models provide a unified and powerful representation for these biological networks, including protein interactions, metabolic networks, gene regulatory networks, and gene co-expression networks, among others.

 

Graph mining and management, referred to as GMM for simplicity, is a highly inter-disciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, graph algorithms, and information visualization. This course will provide an in depth discussion of the main topics (including and not limited to scale-free graphs, random graphs, clustering, pattern discovery,  graph queries, and etc.) in GMM as well as a wide spectrum of applications such as web mining, social network analysis, sensor/P2P network monitoring, software engineering, and bioinformatics.

 

This course teaches the fundamental techniques for mining and managing massive graphs (networks). Some basic mathematical tools, such as random graph, spectral methods, and graph theoretical approaches will also be covered.   Each student  will be expected to present a paper and lead the discussion following his/her presentation and do a project on selected topics.

There will be nor exam.

Tentative Schedule:

Weekly schedule

Topics

Presenter

Week 1

Introduction

Jin

Week 2

Statistical Properties of Massive Graphs

Jin

Week 3

Probability Space and Random Variables, Inequalities

Jin

Week 4

Classical Random Graph Theory

Jin

Week 5

Basic Linear Algebra

Jin

Week 6

SVD and Latent Semantic Indexing (LSI)

Jin

Week 7

Markov Chain and Random Walk

Jin

Week 8

Page Rank and HIST

Jin

Week 9

Generative Models

Jin

Week 10

Graph Decomposition/ Clustering

 

One guest lecture by Dr. Dragan

Week 11

Pattern Discovery

Jin

Week 12

Connection Subgraphs and

Labeling Schemes for Reachability/Distances

One guest lecture by Dr. Dragan

Week 13-Week 15

Student Presentations

 

 

Suggested References:

[1] Data Mining --- Concepts and techniques, by Han and Kamber, Morgan Kaufmann, 2001. (ISBN:1-55860-489-8)
[2] Mining the Web --- Discovering Knowledge from Hypertext Data, by Chakrabarti, Morgan Kaufmann, 2003. (ISBN:1-55860-754-4)

[3] Random Graph Dynamics, By Rick Durrett, Cornell U, Cambridge U. Press.

Additional materials will include papers supplied by the instructor.

Requirements & Grading Policy

A student's grade will be determined as a weighted average of homework (30%),  presentation (35%), and project (35%). 

Acknowledgement

          Thanks to Dr. Panayiotis Tsaparas for slides material.

Updated: