Kent State University 
CS 6/79995 St: Data Mining for Complex Network
Spring 2010

Instructor: Ruoming Jin

Mondays and Wednesdays 5:30AM-6:45PM; Rm. MSB 115
Office Hours: Mondays and Wednesdays 5:00-5:30PM

---------------------------------------------------------

Prerequisites

CS 4/56101 Algorithms
CS 6/73105 Data Mining Techniques
Or Consent of the Instructor

Course Overview

The existing database and data mining mainly deal with relational and/or semi-structured data. The recent advance in science and technology has led to a new type of data – graph data – being collected with an unprecedented rate in many fields of human endeavor. Driven by the economic and scientific need, storing, querying, and mining such data are emerging as a new research area.

 

In the graph model, the vertices represent different entities and the edges capture their pair-wise relationships.  For instance, the entire WWW can be treated as a massive direct graph where the nodes represent the web pages. In system biology, various kinds of massive networks, that may easily contain thousands and thousands of genes or gene products, are being constructed for many different species.  The graph models provide a unified and powerful representation for these biological networks, including protein interactions, metabolic networks, gene regulatory networks, and gene co-expression networks, among others.

  

This course will discuss the state-of-art research on mining and analyzing complex networks. Each student will be expected to present a paper and lead the discussion following his/her presentation. Each student will write an independent review on a subject. Two students may form a group to do a course project.

 

Suggested References:

[1] Data Mining --- Concepts and techniques, by Han and Kamber, Morgan Kaufmann, 2001. (ISBN:1-55860-489-8)
[2] Mining the Web --- Discovering Knowledge from Hypertext Data, by Chakrabarti, Morgan Kaufmann, 2003. (ISBN:1-55860-754-4)

[3] Random Graph Dynamics, By Rick Durrett, Cornell U, Cambridge U. Press.

Additional materials will include papers supplied by the instructor.

Requirements & Grading Policy

A student's grade will be determined as a weighted average of class participants (10%),  presentation (30%), review (25%) and project (35%). 

Ø      Project Due Date: 5/12 (Wednesday) Interview Date: 5/12 Wednesday afternoon.

Ø      Paper Due Date: 5/16 (Sunday)

Papers List (Some papers can be downloaded on Campus or using VPN and please select only numbered papers).

Here is the Presentation Assignment and Schedule.

1.      Introduction

Reading notes on probability review

Complex Network Survey:

M. E. J. Newman, The structure and function of complex networks, SIAM Review 45, 167-256 (2003)

S. Boccaletti et al., Complex Networks: Structure and Dynamics, Phys. Rep., 424 (2006), 175-308

S. N. Dorogovtsev and J.F.F. Mendes, Evolution of Networks, Adv. Phys. 51, 1079 (2002)

      C. T. Butts Revisiting the Foundations of Network Analysis, Science 2009 325:414-416 (On campus, or through VPN).

        Lazer, David, Pentland Alex, Adamic Lada A., Aral Sinan, Barabasi Albert-Laszlo, Brewer Devon, Christakis Nicholas, Contractor Noshir, Fowler James, Gutmann Myron, et al. , Computational Social Science,  Science, 02/2009, Volume 323, Issue 5915, p.721 - 723, (2009)

2.      Network Role (Guest Lecture 1 & Lecture 2 by Victor Lee).

1)      G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In KDD'02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538-543. ACM Press, 2002.

2)      D. Fogaras and B. Racz. Scaling link-based similarity search. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 641--650, New York, NY, USA, 2005. ACM.

3)      EA Leicht, P Holme, MEJ Newman, Vertex similarity in networks, Physical Review E, 2006

4)      Vincent Blondel, Anahi Gajardo, Maureen Heymans, Pierre Senellart, Paul Van Dooren, A Measure of Similarity between Graph Vertices: Applications to Synonym Exaction and Web Searching, SIAM Review, 2004

5)      Jörg Reichardt and Douglas R. White, "Role models for complex networks", [Discussion

6)      E. Airoldi, D. Blei, S. Fienberg, and E. Xing.   Mixed membership stochastic blockmodels.   Journal of Machine Learning Research , 9:1981--2014, 2008 . [Shorter version from NIPS 2008]

7)      M. Shafiei, H. Chipman, Mixed-Membership Stochastic Block-Models for Transactional Data, Workshop on Analyzing Networks and Learning with Graphs NIPS 2009. 

8)      JJ Daudin, F Picard, S Robin ,  A mixture model for random graph,  Statistics and computing, 2008 (You can download this paper on campus, or login through VPN Client).

1.      Modularity/Clustering

9)      A Tutorial on Spectral Clustering. Ulrike von Luxburg, Statistics and Computing 2007.

10)   Modularity and community structure in networks, M. E. J. Newman, Proc. Natl. Acad. Sci. USA 103, 8577-8582 (2006). & Scott White and Padhraic Smyth. A Spectral Clustering Approach To Finding Communities in Graphs, In SDM, 2005.

11)  M. B. Hastings. Community detection as an inference problem. Physical Review E, 74(3):035102, 2006

12)  Zhou, D., J. Huang and B. Schölkopf: Learning from Labeled and Unlabeled Data on a Directed Graph,Proceedings of the 22nd International Conference on Machine Learning, 2005.

13)  B Kulis, S Basu, I Dhillon, R Mooney, Semi-supervised graph clustering: a kernel approach, Machine Learning, 2009

14)  14-a) M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105:1118–1123, 2008  & Martin Rosvall, Daniel Axelsson, and Carl T. Bergstrom, The map equation, 2009.

14-b) Martin Rosvall and Kim Sneppen, Reinforced communication and social navigation generate groups in model networks,  Phys. Rev. E 79, 026111 (2009) [arXiv:0809.4803] [Java simulation]

15)  M Meila, W Pentney, Clustering by weighted cuts in directed graphs, SIAM Conference on Data Mining (SDM), 2007.

16)  Peter J. Bickel and Aiyou Chen, "A nonparametric view of network models and Newman-Girvan and other modularities", Proceedings of the National Academy of Sciences (USA) 106 (2009): 21068--21073

17)  J Reichardt, S Bornholdt, Statistical mechanics of community detection, Physical Review E, 2006

18)    Cosma Rohilla Shalizi, Marcelo F. Camperi, Kristina Lisa Klinkner, Discovering Functional Communities in Dynamical Networks.

2.      Random Walk/Electric Network

            19)  Random Walks on Graphs: A Survey, Lovasz. (This will be split into 2 or 3 classes, each group covers only part of the survey). 

            20)  Random Walks and Electric Networks, Doyle & Snell (Depends on time, this will be split into 2 or 3 classes).

            21) F Fouss, A Pirotte, JM Renders, M Saerens,  Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation, TKDE, 2006.

              22)   The Resistance Distance is Meaningless for Large Random Geometric Graphs by A. Radl, U. von Luxburg, M. Hein

             23) A Measure of the Connection Strengths between Graph Vertices with Applications, Chen, Jie; Safro, Ilya 

             24) DA Spielman, N Srivastava , Graph sparsification by effective resistances, STOC, 2008.

3.      Social Network Analysis

             25)Structure and tie strengths in mobile communication networks, Onnela JP, Saramäki J, Hyvönen J, Szabó G, Lazer D, Kaski K, Kertész J, Barabási AL.,  PNAS 07

             26) S. Aral, L. Muchnik, and A. Sundararajan, Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks, PNAS, 2009. (On campus or remote login using VPN).

      27)  N. Eagle, A. Pentland, and D. Lazer  Inferring friendship network structure by using mobile phone data, Proc. Natl. Acad. Sci. USA 2009 106:15274-15278

       28) Hidalgo. “The dynamics of a mobile phone network.” Physica A Statistical Mechanics and its Applications, v. 387 issue 12, 2008, p. 3017 

             29)   Understanding individual human mobility patterns, Marta C. González, César A. Hidalgo & Albert-László Barabási, nature, 2008

             30) Classes of complex networks defined by role-to-role connectivity profiles, Roger Guimerà, Marta Sales-Pardo and Luís A. N. Amaral, nature physics, 2007.

             31) D. Liben-Nowell, J. Kleinberg. Tracing Information Flow on a Global Scale Using Internet Chain-Letter Data. Proc. National Academy of Sciences, 105(12):4633–4638, 25 March 2008.

             32) G. Kossinets, J. Kleinberg, D. Watts. The Structure of Information Pathways in a Social Communication Network. Proc. 14th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2008.

             33) S. Marvel, J. Kleinberg, S. Strogatz. The Energy Landscape of Social Balance. Physical Review Letters 103(19), 2009.

             34) Microscopic Evolution of Social Networks by J. Leskovec, L. Backstrom, R. Kumar, A. Tomkins. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2008.

             35) Statistical Properties of Community Structure in Large Social and Information Networks by J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. International World Wide Web Conference (WWW), 2008

             36) The Dynamics of Viral Marketing by J. Leskovec, Lada Adamic, Bernardo Huberman. ACM Conference on Electronic Commerce (EC), 2006. [Extended version]

             37) Kronecker Graphs: an approach to modeling networks by J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, Z. Ghahramani. Journal of Machine Learning Research (JMLR), 2010.

             38) The Impact of Boundary Spanning Scholarly Publications and Patents, Shi, Xiaolin, Adamic Lada A., Tseng Belle L., and Clarkson Gavin S. , PLoS ONE, 08/2009, Volume 4, Issue 8, p.e6547, (2009)

           

4.      Power-Law/Scale-Free Property

            39) Barabási, Albert-László and Albert, Réka. "Emergence of scaling in random networks". Science, 286:509-512, October 15, 1999.

            40) Li, L., Alderson, D., Tanaka, R., Doyle, J.C., Willinger, W., Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications (Extended Version). Internet Mathematics, 2

            41) Power laws, Pareto distributions and Zipf's law MEJ Newman - Arxiv preprint cond-mat/0412004, 2004

5.      Network Search/Routing

            42) J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium on Theory of Computing, 2000. Also appears as Cornell Computer Science Technical Report 99-1776 (October 1999) & J. Kleinberg. The Small-World Phenomenon and Decentralized Search. SIAM News 37(3), April 2004

            43) Martin Rosvall, Andreas Grönlund, Petter Minnhagen, and Kim Sneppen, Searchability of networks,
Phys. Rev. E 72, 046117 (2005), [cond-mat/0505400].

            44) Networks and cities: an information perspective, Martin Rosvall, Ala Trusina, Petter Minnhagen, and Kim Sneppen
Phys. Rev. Lett. 94, 028701 (2005), [cond-mat/0407054] & Hide and seek on complex networks, Kim Sneppen, Ala Trusina, and Martin Rosvall,
Europhys. Lett. 69, 853 (2005), [cond-mat/0407055]

 

6.      Graph Dynamics

7.      Graph Database