CS 49995 & CS 63016 ST: Big Data Analytics

Spring 2017

 

Instructor: Xiang Lian

Office Location: Mathematics and Computer Science Building, Room 264

Office Phone Number: (330) 672-9063

Web: http://www.cs.kent.edu/~xlian/index.html

Email: xlian@kent.edu

Course: ST: Big Data Analytics

Prerequisites: Permission of the instructor

Time: 7:00pm ~ 8:15pm, TR

Classroom Location: Smith Hall (SMH), Room 111

Course Webpage: http://www.cs.kent.edu/~xlian/course_archive/2017Spring_CS49995_CS63016.html

 

Instructor's Office Hours: Tuesday and Thursday (1:30pm ~ 4:30pm); or by appointment

 

Graduate Assistant: Zhiqiang Wang

Office: N/A

E-mail: zwang22@kent.edu

Phone: N/A

TA's Office Hours: N/A


Enrollment/Official Registration of this Class

The official registration deadline for this course is Jan. 22, 2017. University policy requires all students to be officially registered in each class they are attending. Students who are not officially registered for a course by published deadlines should not be attending classes and will not receive credit or a grade for the course. Each student must confirm enrollment by checking his/her class schedule (using Student Tools in FlashLine) prior to the deadline indicated. Registration errors must be corrected prior to the deadline.

 

For registration deadlines, enter the requested information for a Detailed Class Search from the Schedule of Classes Search found at:

https://keys.kent.edu:44220/ePROD/bwlkffcs.P_AdvUnsecureCrseSearch?term_in=201680

 

After locating your course/section, click on the Registration Deadlines link on the far right side of the listing.

 

Last day to withdraw: Mar. 26, 2017

 


Textbooks and Reference Books

Kuan-Ching Li, Hai Jiang, Laurence T. Yang, and Alfredo Cuzzocrea. Big Data: Algorithms, Analytics, and Applications. Chapman & Hall/CRC Big Data Series, ISBN 9781482240559, 2015.

 

Thomas Erl, Wajid Khattak, and Dr. Paul Buhler. Big Data Fundamentals: Concepts, Drivers & Techniques. The Prentice Hall Service Technology Series, ISBN-13: 978-0134291079, 2016.

 

Resources of Reading Materials

*    A reading list will appear here J

o   Indexing for Big Data

v  (Grid file) P. Rigaux, M. Scholl, and A. Voisard. Spatial Databases - with application to GIS. Morgan Kaufmann, San Francisco, 2002. http://bsolano.com/ecci/claroline/backends/download.php/TGlicm9zX2RlX3RleHRvL1NwYXRpYWxEQnNXaXRoQXBwbGljYXRpb25Ub0dJUy5wZGY%3D?cidReset=true&cidReq=CI1314

v  (Bitmap index) P. Nagarkar, K. S. Candan, and A. Bhat. Compressed Spatial Hierarchical Bitmap (cSHB) Indexes for Efficiently Processing Spatial Range Query Workloads. In PVLDB, 8(12), pages 1382-1393, 2015. http://dl.acm.org/citation.cfm?id=2824038

v  (Quad-tree) H. Samet and R. E. Webber. Storing a collection of polygons using quadtrees. In ACM Trans. Graph, 1985. https://pdfs.semanticscholar.org/65ee/4429b5509173f12309539e809ac533e84690.pdf

v  (K-D-B-tree) J. T. Robinson. The K-D-B-Tree: a search structure for large multidimensional dynamic indexes. In SIGMOD, 1981. http://repository.cmu.edu/cgi/viewcontent.cgi?article=3451&context=compsci

v  (K-D-tree) R. A. Brown. Building a Balanced k-d Tree in O(kn log n) Time. In Journal of Computer Graphics Techniques (JCGT), vol. 4, no. 1, 50-68, 2015. http://jcgt.org/published/0004/01/03/paper.pdf

v  (R-tree) A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, 1984. http://www-db.deis.unibo.it/courses/SI-LS/papers/Gut84.pdf

v  (R+-tree) T. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-Tree: A dynamic index for multi-dimensional objects. In VLDB, 1987. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.3272

v  (R*-tree) N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles. In SIGMOD, 1990. https://www.cs.umd.edu/class/fall2002/cmsc818s/Readings/rstar-tree.pdf

v  (X-tree) S. Berchtold, D. A. Keim, and H.-P. Kriegel. The X-tree: An Index Structure for High-Dimensional Data. In VLDB, 1996. https://pdfs.semanticscholar.org/83f6/d2b79b68af1115db013907df78b96dd82ea7.pdf

v  (SS-tree) D. A. White and R. Jain. Similarity Indexing with the SS-tree. In ICDE, 1996. http://www.cs.uml.edu/~cchen/580-S06/reading/WJ96.pdf

v  (SR-tree) N. Katayama and S. Satoh. The SR-tree: an index structure for high-dimensional nearest neighbor queries. In SIGMOD, 1997. https://pdfs.semanticscholar.org/20b5/fc20821968a2e990183ee4613c591951597c.pdf

v  (M-tree) P. Ciaccia, M. Patella, and P. Zezula. M-tree An Efficient Access Method for Similarity Search in Metric Spaces. In VLDB, 1997. http://www.vldb.org/conf/1997/P426.PDF

v  (OMNI-family) R.F.S. Filho, A. Traina, C. Traina, and C. Faloutsos. Similarity search without tears: the OMNI-family of all-purpose access methods. In ICDE, 2001. http://repository.cmu.edu/cgi/viewcontent.cgi?article=1565&context=compsci or http://ieeexplore.ieee.org/document/914877/?reload=true

v  (High-dimensional indexing) C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In SIGMOD, 1994. http://dl.acm.org/citation.cfm?id=191925

v  (Locality Sensitive Hashing) A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. In VLDB, 1999. http://www.vldb.org/conf/1999/P49.pdf

v  H. Samet. Foundations of Multidimensional and Metric Data Structures. The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling, ISBN: 0123694469, 2005. http://dl.acm.org/citation.cfm?id=1076819

v  ...

o   Queries Over Big Data

v  (Range Query) http://www.bowdoin.edu/~ltoma/teaching/cs340/spring08/Papers/Rtree-chap1.pdf

v  (Nearest Neighbor Query; Depth-First) N. Roussopoulos, S. Kelly, and F. Vincent. Nearest Neighbor Queries. In SIGMOD, 1995. http://www.postgis.org/support/nearestneighbor.pdf

v  (Nearest Neighbor Query; Best-First) A. Henrich. A Distance Scan Algorithm for Spatial Access Structures. In ACM GIS, 1994. https://pdfs.semanticscholar.org/37cc/e942c8f4a7d501f15bbfe41700cf98be2173.pdf

v  (k-Nearest Neighbor Query; VoR-Tree) M. Sharifzadeh and C. Shahabi. VoR-Tree: R-trees with Voronoi Diagrams for Efficient Processing of Spatial Nearest Neighbor Queries. In VLDB, 2010. http://infolab.usc.edu/papers/VorTree.pdf

v  (Group Nearest Neighbor Query) D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis. Group Nearest Neighbor Queries. In ICDE, 2004. http://www.cs.ust.hk/~dimitris/PAPERS/ICDE04-GNN.pdf

v  (Reverse Nearest Neighbor Query; KM) F. Korn and S. Muthukrishnan. Influence Sets Based on Reverse Nearest Neighbor Queries. In SIGMOD, 2000. https://graphics.stanford.edu/courses/cs468-06-fall/Papers/19%20reverse%202.pdf

v  (Reverse Nearest Neighbor Query; YL) C. Yang and K. Lin. An Index Structure for Efficient Reverse Nearest Neighbor Queries. In ICDE, 2001. http://ieeexplore.ieee.org/document/914862/

v  (Reverse Nearest Neighbor Query; SAA) I. Stanoi, D. Agrawal, and A. Abbadi. Reverse Nearest Neighbor Queries for Dynamic Databases. In SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000. http://infolab.usc.edu/csci599/Fall2007/papers/b-2.pdf

v  (Reverse Nearest Neighbor Query; TPL) Y. Tao, D. Papadias, and X. Lian. Reverse kNN Search in Arbitrary Dimensionality. In VLDB, 2004. http://www.cs.kent.edu/~xlian/papers/VLDB04-RNN.pdf

v  (Top-k Query; Onion) Y.-C. Chang, L. D. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smith. The Onion Technique: Indexing for Linear Optimization Queries. In SIGMOD, 2000. http://dl.acm.org/citation.cfm?id=335433

v  (Top-k Query; PREFER) V. Hristidis, N. Koudas, and Y. Papakonstantinou. PREFER: A System for the Efficient Execution of Multi-Parametric Ranked Queries. In SIGMOD, 2001. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.151.6379&rep=rep1&type=pdf

v  (Skyline Query; BNL & D&C) S. Brzsnyi, D. Kossmann, and K. Stocker. The Skyline Operator. In ICDE, 2001. http://infolab.usc.edu/csci599/Fall2007/papers/e-1.pdf

v  (Skyline Query; Bitmap & Index) K. Tan, P. Eng, and B. Ooi. Efficient Progressive Skyline Computation. In VLDB, 2001. http://www.vldb.org/conf/2001/P301.pdf

v  (Skyline Query; NN) D. Kossmann, F. Ramsak, and S. Rost. Shooting Stars in the Sky: an Online Algorithm for Skyline Queries. In VLDB, 2002. https://pdfs.semanticscholar.org/10fe/ecb5eebbb958439aabb2e10bd56739e315c9.pdf

v  (Skyline Query; BBS) D. Papadias, Y. Tao, G. Fu, and B. Seeger. An Optimal and Progressive Algorithm for Skyline Queries. In SIGMOD, 2003. http://www.cs.ust.hk/~dimitris/PAPERS/SIGMOD03-Skyline.pdf

v  (Spatial Skyline) M. Sharifzadeh and C. Shahabi. The Spatial Skyline Queries. In VLDB, 2006. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.858&rep=rep1&type=pdf

v  (Multi-Source Skyline) K. Deng, X. Zhou, and H. T. Shen. Multi-Source Skyline Query Processing in Road Networks. In ICDE, 2007. http://ieeexplore.ieee.org/document/4221728/

v  (Metric Skyline) L. Chen and X. Lian. Dynamic Skyline Queries in Metric Spaces. In EDBT, 2008. http://dl.acm.org/citation.cfm?id=1353386

v  (Top-k Dominating Query) M. L. Yiu and N. Mamoulis. Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data. In VLDB, 2007. https://pdfs.semanticscholar.org/2817/630b8e919c8ecb61b7397f4dc11ca7d93e91.pdf

v  (aR-Tree) I. Lazaridis and S. Mehrotra. Progressive Approximate Aggregate Queries with a Multi-Resolution Tree Structure. In SIGMOD, 2001. ftp://ftp.cse.buffalo.edu/users/azhang/disc/SIGMOD/pdf-files/401/250-progressive.pdf

v  (Reverse Skyline Query) E. Dellis and B. Seeger. Efficient Computation of Reverse Skyline Queries. In VLDB, 2007.

v  (Inverse Ranking Query) C. Li. Enabling data retrieval: by ranking and beyond. In Ph.D. Dissertation, University of Illinois at Urbana-Champaign, 2007.

v  (Aggregate Query) I. Lazaridis and S. Mehrotra. Progressive Approximate Aggregate Queries with a Multi-Resolution Tree Structure. In SIGMOD, 2001.

v  (Histogram) M. Muralikrishna and D. J. DeWitt. Equi-depth multidimensional histograms. In SIGMOD, 1988.

v  (Sampling) R. J. Lipton, J. F. Naughton, and D. A. Schneider. Practical Selectivity Estimation through Adaptive Sampling. In SIGMOD, 1990.

v  (Wavelet) M. Garofalakis and P. B. Gibbons. Wavelet Synopses with Error Guarantees. In SIGMOD, 2002.

v  (Keyword Search; BANK) A. Hulgeri and C. Nakhe. Keyword Searching and Browsing in Databases using BANKS. In ICDE, 2002.

v  (Keyword Search; BANK) V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, 2005.

v  (Keyword Search; BLINKS) H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: ranked keyword searches on graphs. In SIGMOD, 2007. http://db.cs.duke.edu/papers/2007-SIGMOD-hwyy-kwgraph.pdf

v  ...

o   Big Graph Data

o   Big Probabilistic Data

o   Big Data Management in Real Applications (e.g., time-series, sensor networks, road networks, social networks, bioinformatics, Semantic Web, etc.)

*    Research papers/surveys from database conferences/journals (SIGMOD, PVLDB, ICDE, TODS, VLDBJ, and TKDE)

o   Database Journals

v  TODS: http://dblp.uni-trier.de/db/journals/tods/index.html

v  VLDBJ: http://dblp.uni-trier.de/db/journals/vldb/

v  TKDE: http://dblp.uni-trier.de/db/journals/tkde/index.html

o   Database Conferences

v  SIGMOD: http://dblp.uni-trier.de/db/conf/sigmod/

v  VLDB: http://www.vldb.org/pvldb/, or http://dblp.uni-trier.de/db/journals/pvldb/index.html

v  ICDE: http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000178, or http://dblp.uni-trier.de/db/conf/icde/

o   ACM Computing Surveys

v  http://csur.acm.org/

o   Samples of surveys:

v  Indexing: https://www.slac.stanford.edu/pubs/slacpubs/16250/slac-pub-16460.pdf

v  A Survey of Large-Scale Analytical Query Processing in MapReduce: http://link.springer.com/article/10.1007/s00778-013-0319-9

v  A Survey on Parallel and Distributed Data Warehouses: https://pdfs.semanticscholar.org/4f3e/d0d4dfbd0bf4648a7feda94e3176e33ad088.pdf

*    Online resources

o   Datasets and Source Code

v  Spatial data sets and index source code: http://chorochronos.datastories.org/

v  Road network and stream data: https://www.cs.utah.edu/~lifeifei/datasets.html

v  DBpedia RDF data: http://www.dbpedia.org

v  Freebase RDF data: https://developers.google.com/freebase/

v  YAGO1, YAGO2s, YAGO3 RDF data: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/archive/ (YAGO2 paper: https://people.mpi-inf.mpg.de/~kberberi/publications/2010-mpii-tra.pdf)

o   Apache Hadoop

v  http://hadoop.apache.org/

o   Amazon AWS

v  https://aws.amazon.com/

o   Tutorial

v  https://www.lynda.com/ (Sign in with the organization portal)

*    Topics for undergraduate teams (note: You must contact me via my email (xlian@kent.edu) to select the topics below with my permission; first come first serve)

o   Visualization of R*-tree index construction [required skills: C++ or C#, visual programming with C++/C#], R*-tree source code: http://chorochronos.datastories.org/, unavailable

v  N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles. In SIGMOD, 1990. https://www.cs.umd.edu/class/fall2002/cmsc818s/Readings/rstar-tree.pdf

v  UG Team #19 (Xiangxu's team)

o   Visualization of queries over spatio-temporal data (e.g., GPS data, sensory data, etc.) [required skills: C++ or C#, visual programming with C++/C#], R*-tree source code: http://chorochronos.datastories.org/, unavailable

v  Skyline query with R*-tree (BBS algorithm): https://pdfs.semanticscholar.org/c1ee/b9fc4b58031f71cb6926a470b6cb60646c15.pdf, UG Team # 11 (Jamie's team)

o   Visualization of time-series data (e.g., stock time series, or trajectory) [required skills: C++, C#, Java, other visual programming tools, or mobile programming], available

v  Stock data prediction: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4358952, UG Team #12 (Shane's team)

v  Trajectory: https://pdfs.semanticscholar.org/8fcf/68502233c7cbf2360f126781fd14b54d265a.pdf, UG Team #15&#19

o   Visualization of query processing over large-scale road networks (e.g., trip planner, the shortest traveling time query, and gas station query) [required skills: C++, C#, Java, or other visual programming tools], available

v  Trip planner: http://ieeexplore.ieee.org/document/6613474/, UG Team #2 (Joshua's team): Query processing over large scale road networks

o   Visualization of RDF graph queries in the Semantic Web (e.g., subgraph matching) [required skills: C++, C#, Java, or other visual programming tools], available

v  K-nearest keyword search: http://www.sciencedirect.com/science/article/pii/S1570826813000371

v  Keyword search over probabilistic RDF graphs: http://ieeexplore.ieee.org/document/6940261/

v  Subgraph matching over probabilistic RDF graphs: http://dl.acm.org/citation.cfm?id=1989341

o   Visualization of social networks (including social network data extraction and keyword search) [required skills: C++, C#, Java, or other visual programming tools], UG Team #3 (Aron's team): Social networks

o   Visualization of Web data (including Web crawling and Web page visualization) [required skills: C++, C#, Java, or other visual/network programming tools], UG Team #6 (Tyrone' team): Visualization of Web Data

o   ...

*    Survey/research topics for graduate teams

o   Distributed indexing

o   Queries over (distributed) spatio-temporal data

v  G Team #8 (Weichuan's team): Queries over (distributed) spatio-temporal data

o   Queries over (distributed) time-series data

o   Queries over (distributed) stream data

v  G Team #13 (Gayatri's team): Query Over Distributed Stream Data

v  G Team #16 (Saikumar's team): Query Over Distributed Stream Data

v  G Team #5 (Jampana's team): Query Over Distributed Stream Data (Survey); Continuous NN, taxi data (Project)

o   Queries over (distributed) graph data (e.g., RDF graphs, social networks, road networks, biological networks, chemical compounds, etc.)

v  G Team #18 (Kyle's team): Queries over graph data

v  G Team #9 (Muhammad's team): Queries over graph data in social networks

o   Queries over (distributed) probabilistic/uncertain data

v  G Team #1 (Niranjan's team): Queries over (distributed) probabilistic/uncertain data

o   Data privacy preserving

v  (k-Anonymity) K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In Proc. of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 49 - 60, 2005. G Team #? (Surabhee' team): Data privacy preserving (k-Anonymity)

v  (l-Diversity) A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In IEEE International Conference on Data Engineering (ICDE), page 24, 2006.

v  (t-Closeness) N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In IEEE International Conference on Data Engineering (ICDE), pages 106 - 115, 2007.

o   Graph privacy preserving

v  Zach Jorgensen, Ting Yu, Graham Cormode. Publishing Attributed Social Graphs with Formal Privacy Guarantees. In SIGMOD, 2016.

v  Wei-Yen Day, Ninghui Li, Min Lyu. Publishing Graph Degree Distribution with Node Differential Privacy. In SIGMOD, 2016.

v  Zhao Chang, Lei Zou, Feifei Li. Privacy Preserving Subgraph Matching on Large Graphs in Cloud. In SIGMOD, 2016.

o   Big data visualization

 


 

Catalog Description

This course will cover a series of important Big-Data-related problems and their solutions. Specifically, we will introduce the characteristics and challenges of the Big Data, state-of-the-art computing paradigms/platforms (e.g., MapReduce), big data programming tools (e.g., Hadoop and MongoDB), big data extraction/integration, big data storage, scalable indexing for big data, big graph processing, big data stream techniques and algorithms, big probabilistic data management, big data privacy, big data visualizations, and big data applications (e.g., spatial, finance, multimedia, medical, health, and social data).

 


 

Tentative Schedule

Week

Topics

Notes1

Week 1 (Jan. 17)

Introduction

 

Week 1 (Jan. 19)

 

 

Week 2 (Jan. 24)

Indexing Big Data (1)

 

Week 2 (Jan. 26)

 

Homework 1 (Due on Feb. 9)

Week 3 (Jan. 31)

 

 

Week 3 (Feb. 2)

Indexing Big Data (2)

Feb. 2: Deadline to form a study group with 2-4 members

 

Project Template

Week 4 (Feb. 7)

 

 

Week 4 (Feb. 9)

 

Homework 2 (Due on Feb. 23)

Week 5 (Feb. 14)

 

 

Week 5 (Feb. 16)

 

 

Week 6 (Feb. 21)

MapReduce

 

Week 6 (Feb. 23)

Survey/Project Discussions, Q/A

Homework 3 (Due on Mar. 9)

Week 7 (Feb. 28)

 

 

Week 7 (Mar. 2)

 

 

Week 8 (Mar. 7)

Queries Over Big Data (1)

 

Week 8 (Mar. 9)

 

Homework 4 (Due on Mar. 21; extended to Mar. 25)

Week 9 (Mar. 14)

 

Meetings:

2:30pm - Team #1

3:00pm - Team #8

 

Week 9 (Mar. 16)

Survey/Project Discussions, Q/A

Meetings:

3pm - Team #9

3:30pm - Team #5

4pm - Team #10

 

 

For graduate teams, please submit a reading list of survey papers for the topic you choose (due on Mar. 16). I will give you my comments.

Week 10 (Mar. 21)

Queries Over Big Data (2)

Meetings:

1:30pm - Team #13

3:00pm - Team #1

8:15pm - Team #16

 

Week 10 (Mar. 23)

 

Homework 5 (Due on Apr. 6; extended to Apr. 8)

 

Meetings:

10:00am - Team #9

1:30pm - Surabhee' team

 

Last Day to Withdraw: 3/26/2017

Week 11 (Mar. 28)

--

Spring Recess: Mar. 27-Apr. 2; No Classes

Week 11 (Mar. 30)

Week 12 (Apr. 4)

Big Data Applications

 

Week 12 (Apr. 6)

Preparation for Projects; No class

--

Project Report (Sections 1-4) (Due on Apr. 6; Better to include Section 5)

 

Homework 6 (Due on Apr. 20)

Week 13 (Apr. 11)

Project Presentations (Teams #3, #4, #5)

 

UG Team #3: Sentimental Twitter, demo: http://geczy.tech/bigdata

UG Team #4: Keyword Search over RDF Graphs, demo: https://webdev.cs.kent.edu/~rleppelm/RDF.html

G Team #5: Ride Analytics on New York City Taxi Data

Please send me slides of your project talks one week before your presentations!

Week 13 (Apr. 13)

Project Presentations (Teams #6, #7, #8)

 

UG Team #6: Data Exploration Of Wikipedia, demo: https://wikinavigation.github.io/

UG Team #7: Visualization of Spatial-Temporal Data Through R*-Tree

G Team #8: aR-Tree based Hierarchical Clustering: A New Approach of Analyzing Social Media Data, demo: http://personal.kent.edu/~qliu20/courseprojects/2017spring_bigdata/

 

Survey (Due on Apr. 13)

Week 14 (Apr. 18)

--

 

Week 14 (Apr. 20)

--

 

Week 15 (Apr. 25)

Project Presentations (Teams #1, #2, #9)

 

G Team #1: Range-Aggregate Query on Distributed Uncertain Database

UG Team #2: Visualization of query processing over large-scale road networks

G Team #9: A Method to Overcome Influence Maximization: Identifying Influential Users in Twitter Based on User Ranking

 

Week 15 (Apr. 27)

Project Presentations (Teams #10, #11, #12)

 

G Team #10: MapReduce: Data Distribution for Reduce

UG Team #11: Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm

Team #12:

 

Week 16 (May 2)

Project Presentations (Teams #13, #15&#19)

 

Team #13: Sentiment Analysis in Unstructured Text Data

Team #15 & #19: Time Series Data and Moving Object Trajectory

 

Week 16 (May 4)

Project Presentations (Teams #16, #17, #18)

 

Team #16: Video Surveillance Framework Based on BIGDATA Management for Road Transport System

Team #17: Preserving Private Data Among Social Networking Sites Through k-Anonymity

Team #18: Queries Over Graph Data: Presidential Election

Final Project Report (due on May 4; extended to May 8 - hard deadline) Please include project report, presentation slides, source code, and readme file in one single package)

 

Please rate your team members (0 ~ 5) on Blackboard (by default, you will get 5 bonus points if no other team members enter the ratings)

Week 17 (May 8-14)

--

 

 

Academic calendar: https://www.kent.edu/sites/default/files/academic-calendar-2014-2018_0.pdf

Final exam schedule: http://www.kent.edu/registrar/spring-final-exam-schedule

NOTE: Presentation dates and deadlines are tentative. Exact dates will be announced in class!!!


Scoring and Grading

Undergraduate students

 

5% - Attendance

60% - Assignments

25% - Project

10% - Presentation & Q/A

5% - Bonus Points, rated by other team members

 

Graduate students

 

5% - Attendance

50% - Assignments

10% - Survey

25% - Research Project

10% - Presentation & Q/A

5% - Bonus Points, rated by other team members

 

A = 90 - 105

B = 80 - 89

C = 70 - 79

D = 60 - 69

F = <60

 


 

Guidelines for Assignments/Surveys/Projects

 

All assignments/surveys/projects will be submitted electronically only. Instructions are given separately.

 

  Assignments must be submitted to Blackboard by the due date. Note that, for team assignments (e.g., surveys or projects), only one team member can represent your team to submit the assignments (otherwise, it is not traceable which submission is the correct version).

  An assignment/project turned in within two weeks after the due date will be considered late and will lose 30% of its grade.

  No assignment will be accepted for grading after two weeks late.

  The late submission needs prior consent of the instructor.

For surveys/projects, please form a team with 2-4 team members. In each team, all team members should be either undergraduate or graduate students (i.e., not a mixed group). The graduate teams need to do more research works (i.e., 1 survey, replacing one homework). The workload should be distributed evenly to each team member. All team members should participate in the surveys or projects, and receive the same score for survey/project. However, there is an extra bonus points (5 points) for other team members to rate your performance in the team work.

* Please send the full names, student IDs, emails, and graduate/undergraduate status of all team members to the TA (Zhiqiang Wang, zwang22@kent.edu) by Feb. 2, 2017, and TA will confirm your team by replying you with your team number.

 

 


Lecture Attendance Policy

Attendance in the lecture is mandatory. Students are expected to attend lectures, study the text, and contribute to discussions. You need to write your name on attendance sheets throughout the course, so please attend every lecture.

Students are expected to attend all scheduled classes and may be dropped from the course for excessive absences. Legitimate reasons for an "excused" absence include, but are not limited to, illness and injury, disability-related concerns, military service, death in the immediate family, religious observance, academic field trips, and participation in an approved concert or athletic event, and direct participation in university disciplinary hearings.

Even though any absence can potentially interfere with the planned development of a course, and the student bears the responsibility for fulfilling all course requirements in a timely and responsible manner, instructors will, without prejudice, provide students returning to class after a legitimate absence with appropriate assistance and counsel about completing missed assignments and class material. Neither academic departments nor individual faculty members are required to waive essential or fundamental academic requirements of a course to accommodate student absences. However, each circumstance will be reviewed on a case-by-case basis.

For more details, please refer to University policy 3-01.2: http://www.kent.edu/policyreg/administrative-policy-regarding-class-attendance-and-class-absence.


Make-up Presentation Policy

No make-up presentation will be given except for university sanctioned excused absences. If you miss a presentation (for a good reason), it is your responsibility to contact me before the presentation, or soon after the presentation as possible.


Academic Dishonesty Policy

The University expects a student to maintain a high standard of individual honor in his/her scholastic work. Unless otherwise required, each student is expected to complete his or her assignment individually and independently (even in the team, workload should be distributed to team members to accomplish individually). Although it is encouraged to study together, the work handed in for grading by each student is expected to be his or her own. Any form of academic dishonesty will be strictly forbidden and will be punished to the maximum extent. Copying an assignment from another student (team) in this class or obtaining a solution from some other source will lead to an automatic failure for this course and to a disciplinary action. Allowing another student to copy one's work will be treated as an act of academic dishonesty, leading to the same penalty as copying.

University policy 3-01.8 deals with the problem of academic dishonesty, cheating, and plagiarism. None of these will be tolerated in this class. The sanctions provided in this policy will be used to deal with any violations. If you have any questions, please read the policy at http://www.kent.edu/policyreg/administrative-policy-regarding-student-cheating-and-plagiarism and/or ask.


Students with Disabilities

University policy 3-01.3 requires that students with disabilities be provided reasonable accommodations to ensure their equal access to course content. If you have a documented disability and require accommodations, please contact the instructor at the beginning of the semester to make arrangements for necessary classroom adjustments. Please note, you must first verify your eligibility for these through Student Accessibility Services (contact 330-672-3391 or visit www.kent.edu/sas for more information on registration procedures).


Statements for the Course

This course may be used to satisfy the University Diversity requirement. Diversity courses provide opportunities for students to learn about such matters as the history, culture, values and notable achievements of people other than those of their own national origin, ethnicity, religion, sexual orientation, age, gender, physical and mental ability, and social class. Diversity courses also provide opportunities to examine problems and issues that may arise from differences, and opportunities to learn how to deal constructively with them.

 

This course may be used to satisfy the Writing Intensive Course (WIC) requirement. The purpose of a writing-intensive course is to assist students in becoming effective writers within their major discipline. A WIC requires a substantial amount of writing, provides opportunities for guided revision, and focuses on writing forms and standards used in the professional life of the discipline.

 

This course may be used to fulfill the university's Experiential Learning Requirement (ELR) which provides students with the opportunity to initiate lifelong learning through the development and application of academic knowledge and skills in new or different settings. Experiential learning can occur through civic engagement, creative and artistic activities, practical experiences, research, and study abroad/away.

 


 

Disclaimer

The instructor reserves the right to alter this syllabus as necessary.