Mondays 17:30PM-20:15PM; Rm. SMH - 00110
Office Hours: Mondays 16:30PM-17:30PM or by appointment
TA: Xinyu Chang (xchang AT kent email domain)
---------------------------------------------------------
CS 4/56101 Algorithms
CS 33001 Data Structures
CS 4/53005 Database Design
CS 6/73015 Data Mining Techniques
Or Consent of the Instructor
This course
will introduce the state-of-arts computing platforms with the focus on how to
utilize them in processing (managing and analyzing) massive datasets.
Specifically, we will discuss the MapReduce (Hadoop) framework, which
provides the most accessible and practical means of computing in the Cloud. We
will also introduce the emerging distributed database and services, such as HBase/Cassandra etc. We will also cover Latin Pigs and Hive
for large scale data analysis. Finally, we will utilize several key data
processing tasks, including simple statistics, data aggregation, join
processing, frequent pattern mining, data clustering, information retrieval,
PageRank, and massive graph analytics as the case study for large scale data
processing.
Hadoop: The Definitive Guide, Tom
White, O'Reilly
Hadoop In Action, Chuck Lam,
Manning
Data-Intensive Text
Processing with MapReduce, Jimmy Lin and Chris Dyer (www.umiacs.umd.edu/~jimmylin/MapReduce-book-final.pdf)
Data Mining:
Concepts and Techniques, Third Edition, by Jiawei Han
et al., Morgan Kaufmann
Slides
Lecture 1: Introduction to Big
Data
Lecture 2: Statistics
101 & Exploratory Data Analysis (Homework 1@Last slide)
Lecture 3: Business
Intelligence: OLAP, Data Warehouse, and Column Store (Homework 2, Homework 3,
Testing Datasets for Join: Sailor
& Reserve)
Lecture 4: Frequent
Pattern Mining (Homework
4, Testing Datasets: Dataset1, Dataset2, Due
Data: March 3rd)
Lecture 5: Intro
to MapReduce/Hadoop (AWS Tutorial by
Nicholas Tietz, Hadoop Install & Quick Start by Xinyu
Chang)
Lecture 6: Hadoop Programming Tutorial & MapReduce
Programming Patterns (Homework 5)
Lecture 7: Information
Retrieval & MapReduce (Debug Hadoop by Xinyu Chang, Homework 6)
Lecture 8: Relational
Database Operators & MapReduce
Lecture 9: Machine
Learning: Clustering (Unsupervised Learning)
Lecture 10: Machine
Learning: Classification (Supervised Learning)
Lecture 11: Machine
Learning & MapReduce
Lecture 12: Graph
Algorithms & MapReduce (Homework 7)