Kent State University 
CS 4/6/79995: ST: Big Data & Analytics
Spring 2014

Instructor: Ruoming Jin

Mondays 17:30PM-20:15PM; Rm. SMH - 00110
Office Hours: Mondays 16:30PM-17:30PM or by appointment

TA: Xinyu Chang (xchang AT kent email domain)

---------------------------------------------------------

Prerequisites

CS 4/56101 Algorithms
CS 33001 Data Structures

CS 4/53005 Database Design

CS 6/73015 Data Mining Techniques
Or Consent of the Instructor

Course Overview


This course will introduce the state-of-arts computing platforms with the focus on how to utilize them in processing (managing and analyzing) massive datasets. Specifically, we will discuss the MapReduce (Hadoop) framework, which provides the most accessible and practical means of computing in the Cloud. We will also introduce the emerging distributed database and services, such as HBase/Cassandra etc. We will also cover Latin Pigs and Hive for large scale data analysis. Finally, we will utilize several key data processing tasks, including simple statistics, data aggregation, join processing, frequent pattern mining, data clustering, information retrieval, PageRank, and massive graph analytics as the case study for large scale data processing.

 

Reference Textbook

Hadoop: The Definitive Guide, Tom White, O'Reilly

Hadoop In Action, Chuck Lam, Manning

Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer (www.umiacs.umd.edu/~jimmylin/MapReduce-book-final.pdf)

Data Mining: Concepts and Techniques, Third Edition, by Jiawei Han et al., Morgan Kaufmann

 

Slides

Lecture 1: Introduction to Big Data

Lecture 2: Statistics 101 & Exploratory Data Analysis (Homework 1@Last slide)

Lecture 3: Business Intelligence: OLAP, Data Warehouse, and Column Store (Homework 2, Homework 3, Testing Datasets for Join: Sailor & Reserve)

Lecture 4: Frequent Pattern Mining (Homework 4, Testing Datasets: Dataset1, Dataset2, Due Data: March 3rd)

Lecture 5: Intro to MapReduce/Hadoop (AWS Tutorial by Nicholas Tietz, Hadoop Install & Quick Start by Xinyu Chang)

Lecture 6: Hadoop Programming Tutorial & MapReduce Programming Patterns (Homework 5)

Lecture 7: Information Retrieval & MapReduce (Debug Hadoop by Xinyu Chang, Homework 6)

Lecture 8: Relational Database Operators & MapReduce

Lecture 9: Machine Learning: Clustering (Unsupervised Learning)

Lecture 10: Machine Learning: Classification (Supervised Learning)

Lecture 11: Machine Learning & MapReduce

Lecture 12: Graph Algorithms & MapReduce (Homework 7)