CS 69995 & CS 79995 ST: Probabilistic Data Management
Fall 2017
Instructor: Xiang Lian
Office
Location: Mathematics and Computer Science
Building, Room 264
Office
Phone Number: (330) 672-9063
Web: http://www.cs.kent.edu/~xlian/index.html
Email: xlian@kent.edu
Course:
ST: Probabilistic Data Management
CRN: 12633 & 12650
Prerequisites: Permission of the instructor
Time:
11:00am - 12:15pm, TR
Classroom
Location: White Hall (WTH) 110
Course
Webpage: http://www.cs.kent.edu/~xlian/2017Fall_CS69995_CS79995.html
Instructor's Office Hours: 1:00pm - 4:00pm, TR; or by appointment
Graduate
Assistant: N/A
Office:
N/A
E-mail:
N/A
Phone:
N/A
TA's Office Hours: N/A
The official
registration deadline for this course is 09/03/2017. University policy requires all
students to be officially registered in each class they are attending. Students
who are not officially registered for a course by published deadlines should
not be attending classes and will not receive credit or a grade for the course.
Each student must confirm enrollment by checking his/her class schedule (using
Student Tools in FlashLine)
prior to the deadline indicated. Registration errors must be corrected prior to
the deadline.
http://www.kent.edu/registrar/calendars-deadlines
For registration deadlines, enter the requested information
for a Detailed Class Search from the Schedule of Classes Search found at:
https://keys.kent.edu:44220/ePROD/bwlkffcs.P_AdvUnsecureCrseSearch?term_in=201680
After locating your course/section, click on the Registration
Deadlines link on the far right side of the listing.
Last
day to withdraw: 11/05/2017
Reference Books
Charu C.
Aggarwal. Managing and Mining Uncertain Data. Springer Publishing Company,
2009. ISBN: 978-0-387-09689-6 (Print)
978-0-387-09690-2 (Online), https://link.springer.com/book/10.1007%2F978-0-387-09690-2
Lei Chen and
Xiang Lian. Query Processing over Uncertain Databases. In Synthesis Lectures on
Data Management, Vol. 4, No. 6, pages 1-101, Morgan & Claypool Publishers,
2012. ISBN: 9781608458929, http://www.morganclaypool.com/doi/abs/10.2200/S00465ED1V01Y201212DTM033
Dan Suciu,
Dan Olteanu, Christopher Re, and Christoph Koch. Probabilistic Databases. In
Synthesis Lectures on Data Management, Morgan & Claypool Publishers, 2011.
ISBN-13: 978-1608456802, ISBN-10: 1608456803, http://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DTM016
Resources of Reading Materials
Online resources of research
papers/surveys, including database conferences/journals (SIGMOD, PVLDB, ICDE,
TODS, VLDBJ, and TKDE), etc.
o
TODS:
http://dblp.uni-trier.de/db/journals/tods/index.html
o
VLDBJ:
http://dblp.uni-trier.de/db/journals/vldb/
o
TKDE:
http://dblp.uni-trier.de/db/journals/tkde/index.html
o
SIGMOD:
http://dblp.uni-trier.de/db/conf/sigmod/
o VLDB: http://www.vldb.org/pvldb/, or http://dblp.uni-trier.de/db/journals/pvldb/index.html
o ICDE: http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000178, or http://dblp.uni-trier.de/db/conf/icde/
o
Indexing:
https://www.slac.stanford.edu/pubs/slacpubs/16250/slac-pub-16460.pdf
o
A survey of probabilistic data management:
http://ieeexplore.ieee.org/document/4597041/
o
A
Survey of Large-Scale Analytical Query Processing in MapReduce: http://link.springer.com/article/10.1007/s00778-013-0319-9
o
A
Survey on Parallel and Distributed Data Warehouses: https://pdfs.semanticscholar.org/4f3e/d0d4dfbd0bf4648a7feda94e3176e33ad088.pdf
o
Datasets
and Source Code
❖
Spatial
data sets and index source code: http://chorochronos.datastories.org/
❖ Road network and stream data: https://www.cs.utah.edu/~lifeifei/datasets.html
❖ DBpedia RDF data: http://www.dbpedia.org
❖ Freebase RDF data: https://developers.google.com/freebase/
❖
YAGO1,
YAGO2s, YAGO3 RDF data: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/archive/ (YAGO2 paper: https://people.mpi-inf.mpg.de/~kberberi/publications/2010-mpii-tra.pdf)
o
Apache
Hadoop: http://hadoop.apache.org/
o
Amazon
AWS: https://aws.amazon.com/
o
Tutorial:
https://www.lynda.com/ (Sign in with the organization
portal)
A reading list is here ☺
Catalog Description
The purpose of this course is to learn the fundamental
concepts and techniques for probabilistic data management in the area of
databases. Probabilistic data are pervasive in many real-world applications,
such as sensor networks, GPS system, location-based services, mobile computing,
multimedia databases, data extraction/integration, trajectory data analysis,
Semantic Web, privacy preserving, and so on. It is rather challenging how to
efficiently and effectively manage these large-scale probabilistic data. In
this class, we will cover major research topics such as probabilistic/uncertain
data model, probabilistic queries, probabilistic query answering techniques,
data quality issues in databases, and so on. Students are expected to do a
survey on a selected research direction for papers from recent database
journals/conferences, and write research papers or reports with new problems or
solutions. Students will also give presentations to the class to demonstrate
their outcomes. It is also expected that the resulting surveys/papers can be
extended to database conference/journal papers.
Learning Outcomes
At the end of this course, the
students should be able to:
Tentative Schedule
Week |
Topic |
Notes1 |
Week 1 (Aug. 29) |
Please form study
groups, each with 2-3 members, and send your IDs, names, and emails to me (xlian@kent.edu); Due on
Sept. 14 |
|
Week 1 (Aug. 31) |
|
|
Week 2 (Sept. 5) |
|
|
Week 2 (Sept. 7) |
|
Homework
1 (Due on Sept. 21) |
Week 3 (Sept. 12) |
Probabilistic
Query Answering Over Probabilistic/Uncertain Databases (1) |
|
Week 3 (Sept. 14) |
|
|
Week 4 (Sept. 19) |
Presentation:
Managing Uncertainty in Social
Networks. [paper] [slides] Presenter: Md |
|
Week 4 (Sept. 21) |
Probabilistic
Query Answering Over Probabilistic/Uncertain Databases (2) |
Homework
2 (Due on
Oct. 10) |
Week 5 (Sept. 26) |
|
|
Week 5 (Sept. 28) |
Probabilistic
Query Answering Over Probabilistic/Uncertain Databases (3) |
|
Week 6 (Oct. 3) |
Probabilistic
Query Answering Over Probabilistic/Uncertain Databases (4) |
|
Week 6 (Oct. 5) |
Probabilistic
Query Answering Over Probabilistic/Uncertain Databases (5) |
Homework
3 (Due on
Oct. 26) |
Week 7 (Oct. 10) |
|
|
Week 7 (Oct. 12) |
Probabilistic
Query Answering Over Probabilistic/Uncertain Databases (6) |
Deadline for submitting
the survey (Oct. 12) |
Week 8 (Oct. 17) |
|
|
Week 8 (Oct. 19) |
Probabilistic
Query Answering Over Probabilistic/Uncertain Databases (7) |
Project Report (template; Due on Dec. 7) |
Week 9 (Oct. 24) |
|
|
Week 9 (Oct. 26) |
|
Homework
4 (Due on Nov.
9) |
Week 10 (Oct. 31) |
|
|
Week 10 (Nov. 2) |
|
Last Day to Withdraw: 11/05/2017 |
Week 11 (Nov. 7) |
|
|
Week 11 (Nov. 9) |
Project Q/A |
Homework 5
(Due on Nov. 30) |
Week 12 (Nov. 14) |
|
|
Week 12 (Nov. 16) |
|
|
Week 13 (Nov. 21) |
|
|
Week 13 (Nov. 23) |
-- |
Nov. 22-26, Thanksgiving
Recess; No classes |
Week 14 (Nov. 28) |
Project Q/A |
|
Week 14 (Nov. 30) |
Presentations for Project Report Group #1: Range Query for Uncertain Moving Objects |
Course Evaluation
(Nov. 30) |
Week 15 (Dec. 5) |
Group #3: Probabilistic Range Query on Uncertain Raster Cells of
National Insect & Disease Risk Maps Group #4: Clustering Uncertain Taxi Data |
|
Week 15 (Dec. 7) |
Group #5: Distributed
Probabilistic Range-Aggregate Query on Uncertain Data Group #6: TBA |
Deadline for submitting the
project report ( |
Week 16 (Dec. 11-17) |
No Final Exam |
|
Academic
calendar: https://www.kent.edu/sites/default/files/academic-calendar-2014-2018_0.pdf
Final exam
schedule: http://www.kent.edu/registrar/fall-final-exam-schedule
NOTE: Presentation dates and
deadlines are tentative. Exact dates will be announced in class!!!
5% - Attendance & Questions
50% - 5 Homeworks (10 points each)
15% - Survey
o
A
survey on papers for the selected research topics in recent database
conferences/journals
30% - Research Projects &
Presentations
o
Research
project report (including introduction, related works, problem definition,
solutions, experiments, and conclusions) (20%)
o
Presentation
and demonstration for the proposed research project (10%)
5% - Bonus Points, rated by other
team members
10% - (Optional)
Bonus for presenting research papers
A = 90 or higher
B = 80 - 89
C = 70 - 79
D = 60 - 69
F = <60
Guidelines for
Surveys/Papers/Projects
All surveys/papers/projects will be submitted electronically
only. Instructions are given separately.
➢ Assignments must be submitted to Blackboard by the due date.
➢ A survey or paper report turned in within two weeks after the due date will be considered late and will lose 30% of its grade (10% for the first week, and 20% more for the second week).
➢ No assignment will be accepted for grading after two weeks late.
➢ The late submission needs prior consent of the instructor.
Attendance
in the lecture is mandatory. Students are expected to attend lectures, study
the text, and contribute to discussions. You need to write your name on
attendance sheets throughout the course, so please attend every lecture.
Students
are expected to attend all scheduled classes and may be dropped from the course
for excessive absences. Legitimate reasons for an "excused" absence
include, but are not limited to, illness and injury, disability-related
concerns, military service, death in the immediate family, religious
observance, academic field trips, and participation in an approved concert or
athletic event, and direct participation in university disciplinary hearings.
Even
though any absence can potentially interfere with the planned development of a
course, and the student bears the responsibility for fulfilling all course
requirements in a timely and responsible manner, instructors will, without
prejudice, provide students returning to class after a legitimate absence with
appropriate assistance and counsel about completing missed assignments and
class material. Neither academic departments nor individual faculty members are
required to waive essential or fundamental academic requirements of a course to
accommodate student absences. However, each circumstance will be reviewed on a
case-by-case basis.
For
more details, please refer to University policy 3-01.2: http://www.kent.edu/policyreg/administrative-policy-regarding-class-attendance-and-class-absence.
No make-up
presentation will be given except for university sanctioned excused absences.
If you miss a presentation (for a good reason), it is your responsibility to
contact me before the presentation, or soon after the presentation as possible.
The
University expects a student to maintain a high standard of individual honor in
his/her scholastic work. Unless otherwise required, each student is expected to
complete his or her assignment individually and independently (even in the
team, workload should be distributed to team members to accomplish
individually). Although it is encouraged to study together, the work handed in
for grading by each student is expected to be his or her own. Any form of
academic dishonesty will be strictly forbidden and will be punished to the
maximum extent. Copying an assignment from another student (team) in this class
or obtaining a solution from some other source will lead to an automatic
failure for this course and to a disciplinary action. Allowing another student
to copy one's work will be treated as an act of academic dishonesty, leading to
the same penalty as copying.
University
policy 3-01.8 deals with the problem of academic dishonesty, cheating, and
plagiarism. None of these will be tolerated in this class. The sanctions
provided in this policy will be used to deal with any violations. If you have
any questions, please read the policy at http://www.kent.edu/policyreg/administrative-policy-regarding-student-cheating-and-plagiarism and/or ask.
University
policy 3-01.3 requires that students with disabilities be provided reasonable
accommodations to ensure their equal access to course content. If you have a
documented disability and require accommodations, please contact the instructor
at the beginning of the semester to make arrangements for necessary classroom
adjustments. Please note, you must first verify your eligibility for these
through Student Accessibility Services
(contact 330-672-3391 or visit www.kent.edu/sas for more information on registration procedures).
This course may be used to satisfy the University Diversity
requirement. Diversity courses provide opportunities for students to learn
about such matters as the history, culture, values and notable achievements of
people other than those of their own national origin, ethnicity, religion,
sexual orientation, age, gender, physical and mental ability, and social class.
Diversity courses also provide opportunities to examine problems and issues
that may arise from differences, and opportunities to learn how to deal
constructively with them.
This course may be used to satisfy the Writing Intensive
Course (WIC) requirement. The purpose of a writing-intensive course is to
assist students in becoming effective writers within their major discipline. A
WIC requires a substantial amount of writing, provides opportunities for guided
revision, and focuses on writing forms and standards used in the professional
life of the discipline.
This course may be used to fulfill the university's
Experiential Learning Requirement (ELR) which provides students with the
opportunity to initiate lifelong learning through the development and
application of academic knowledge and skills in new or different settings.
Experiential learning can occur through civic engagement, creative and artistic
activities, practical experiences, research, and study abroad/away.
The
instructor reserves the right to alter this syllabus as necessary.