CS 63018 & CS 73018 Probabilistic Data Management

Fall 2021

 

Instructor: Xiang Lian

Office Location: Mathematics and Computer Science Building, Room 264

Office Phone Number: (330) 672-9063

Web: http://www.cs.kent.edu/~xlian/index.html

Email: xlian@kent.edu

Course: Probabilistic Data Management

CRN: 12577 & 12605

Prerequisites: Permission of the instructor

Time: 2:15pm - 3:30pm, MW

Classroom Location: Remote Web Meeting (Blackboard Collaborate Ultra; https://learn.kent.edu/)

Course Webpage: http://www.cs.kent.edu/~xlian/2021Fall_CS63018_CS73018.html

 

Instructor's Virtual Office Hours: By email appointment only (preferably 10:00am - 12:00pm, TR; xlian@kent.edu)

 

Graduate Assistant: N/A

Office: N/A

E-mail: N/A

Phone: N/A

TA's Office Hours: N/A


Enrollment/Official Registration of this Class

The official registration deadline for this course is 09/01/2021. University policy requires all students to be officially registered in each class they are attending. Students who are not officially registered for a course by published deadlines should not be attending classes and will not receive credit or a grade for the course. Each student must confirm enrollment by checking his/her class schedule (using Student Tools in FlashLine) prior to the deadline indicated. Registration errors must be corrected prior to the deadline.

https://www.kent.edu/academic-calendar

 

For registration deadlines, enter the requested information for a Detailed Class Search from the Schedule of Classes Search found at:

https://keys.kent.edu:44220/ePROD/bwlkffcs.P_AdvUnsecureCrseSearch?term_in=201680

 

After locating your course/section, click on the Registration Deadlines link on the far right side of the listing.

 

Last day to withdraw: 11/03/2021

 


Reference Books

Charu C. Aggarwal. Managing and Mining Uncertain Data. Springer Publishing Company, 2009. ISBN: 978-0-387-09689-6 (Print) 978-0-387-09690-2 (Online), https://link.springer.com/book/10.1007%2F978-0-387-09690-2

Lei Chen and Xiang Lian. Query Processing over Uncertain Databases. In Synthesis Lectures on Data Management, Vol. 4, No. 6, pages 1-101, Morgan & Claypool Publishers, 2012. ISBN: 9781608458929, http://www.morganclaypool.com/doi/abs/10.2200/S00465ED1V01Y201212DTM033

Dan Suciu, Dan Olteanu, Christopher Re, and Christoph Koch. Probabilistic Databases. In Synthesis Lectures on Data Management, Morgan & Claypool Publishers, 2011. ISBN-13: 978-1608456802, ISBN-10: 1608456803, http://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DTM016

Resources of Reading Materials

Online resources of research papers/surveys, including database conferences/journals (SIGMOD, PVLDB, ICDE, TODS, VLDBJ, and TKDE), etc.

 

o   TODS: http://dblp.uni-trier.de/db/journals/tods/index.html

o   VLDBJ: http://dblp.uni-trier.de/db/journals/vldb/

o   TKDE: http://dblp.uni-trier.de/db/journals/tkde/index.html

o   SIGMOD: http://dblp.uni-trier.de/db/conf/sigmod/

o   VLDB: http://www.vldb.org/pvldb/, or http://dblp.uni-trier.de/db/journals/pvldb/index.html

o   ICDE: http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000178, or http://dblp.uni-trier.de/db/conf/icde/

o   http://csur.acm.org/

o   Indexing: https://www.slac.stanford.edu/pubs/slacpubs/16250/slac-pub-16460.pdf

o   A survey of probabilistic data management: http://ieeexplore.ieee.org/document/4597041/

o   A Survey of Large-Scale Analytical Query Processing in MapReduce: http://link.springer.com/article/10.1007/s00778-013-0319-9

o   A Survey on Parallel and Distributed Data Warehouses: https://pdfs.semanticscholar.org/4f3e/d0d4dfbd0bf4648a7feda94e3176e33ad088.pdf

o   Datasets and Source Code

    Spatial data sets and index source code: http://chorochronos.datastories.org/

    Road network and stream data: https://www.cs.utah.edu/~lifeifei/datasets.html

    U.S. Government's open data: https://www.data.gov/

    DBpedia RDF data: http://www.dbpedia.org

    Freebase RDF data: https://developers.google.com/freebase/

    YAGO1, YAGO2s, YAGO3 RDF data: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/archive/ (YAGO2 paper: https://people.mpi-inf.mpg.de/~kberberi/publications/2010-mpii-tra.pdf)

o   Apache Hadoop: http://hadoop.apache.org/

o   Amazon AWS: https://aws.amazon.com/

o   Tutorial: https://www.lynda.com/ (Sign in with the organization portal)

 

A reading list is here

 


 

Catalog Description

The purpose of this course is to learn the fundamental concepts and techniques for probabilistic data management in the area of databases. Probabilistic data are pervasive in many real-world applications, such as sensor networks, GPS system, location-based services, mobile computing, multimedia databases, data extraction/integration, trajectory data analysis, Semantic Web, privacy preserving, and so on. It is rather challenging how to efficiently and effectively manage these large-scale probabilistic data. In this class, we will cover major research topics such as probabilistic/uncertain data model, probabilistic queries, probabilistic query answering techniques, data quality issues in databases, and so on. Students are expected to do a survey on a selected research direction for papers from recent database journals/conferences, and write research papers or reports with new problems or solutions. Students will also give presentations to the class to demonstrate their outcomes. It is also expected that the resulting surveys/papers can be extended to database conference/journal papers.

Learning Outcomes

At the end of this course, the students should be able to:

  1. Explain real applications of probabilistic and uncertain data management in databases.
  2. Know the classifications of data uncertainties according to different criteria.
  3. Explain the causes and importance of studying probabilistic data management.
  4. Describe data uncertainty models, possible worlds semantics, correlations in probabilistic data, and probabilistic graph models.
  5. Know various types of probabilistic queries in probabilistic/uncertain databases.
  6. Describe the models, problem definitions, and the proposed techniques for each probabilistic query type in the literature.
  7. Learn to read/write research papers, and understand the general trend of the research in probabilistic data management.
  8. Summarize and analyze the pros and cons of existing works in probabilistic/uncertain databases.
  9. Identify one or two future directions in probabilistic databases, which have not been studied before, or not been extensively studied before, to work on.
  10. Write a survey on related works of probabilistic data management.
  11. Propose new solutions to existing problems or novel solutions to new problems in probabilistic and uncertain data management.
  12. Write a research report or research project/paper on the proposed problems or solutions.
  13. Do experiments on the proposed ideas in probabilistic data management.
  14. Give a presentation on the project report to show off the outcome of the research project. Optionally, give a presentation on the research papers in the literature (with bonus points).
  15. Work in a team (each with 2-3 members) to collaboratively write the survey and research papers.

 

 


 

Tentative Schedule

Week

Topic

Notes1

Week 2 (Aug. 30)

Introduction

Please form study groups, each with 3-4 members, and send your IDs, names, and emails to me (xlian@kent.edu); Due on Sept. 8

Week 2 (Sept. 1)

An Overview of Probabilistic Data Management

 

Week 3 (Sept. 6)

--

Labor Day; No classes

Week 3 (Sept. 8)

Data Uncertainty Model

Homework 1 (Due on Sept. 22)

Week 4 (Sept. 13)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (1)

 

Week 4 (Sept. 15)

 

 

Week 5 (Sept. 20)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (2)

 

Week 5 (Sept. 22)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (3)

Homework 2 (Due on Oct. 6)

Week 6 (Sept. 27)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (4)

Reading Materials: Index (1) (2)

Week 6 (Sept. 29)

 

 

Week 7 (Oct. 4)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (5)

 

Week 7 (Oct. 6)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (6)

Homework 3 (Due on Oct. 27)

Week 8 (Oct. 11)

Q/A Session

 

Week 8 (Oct. 13)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (7)

 

Week 9 (Oct. 18)

Q/A Session

 

Week 9 (Oct. 20)

Probabilistic Graph Databases

 

Project Report (template)

 

Deadline to submit the survey (Oct. 16; modified to Oct. 20, Wednesday)

 

Week 10 (Oct. 25)

Project Q/A

 

Week 10 (Oct. 27)

Data Quality in Probabilistic Databases (1)

Homework 4 (Due on Nov. 10)

Week 11 (Nov. 1)

Project Q/A

 

Week 11 (Nov. 3)

Data Quality in Probabilistic Databases (2)

Last Day to Withdraw: 11/3/2021

 

Week 12 (Nov. 8)

Project Q/A

 

Week 12 (Nov. 10)

Q/A Session

Homework 5 (Due on Nov. 24; extended to Nov. 29)

Week 13 (Nov. 15)

Project Q/A

Submission of Sections 1-4 in Project Report Template (Deadline: 11/15/2021)

Week 13 (Nov. 17)

Q/A Session

 

Week 14 (Nov. 22)

Project Q/A

 

Week 14 (Nov. 24)

--

Nov. 24 - 28, 2021, Thanksgiving Break; No classes

Week 15 (Nov. 29)

Project Q/A

 

Week 15 (Dec. 1)

Presentations & Demos for Projects

 

 

Week 16 (Dec. 6)

Presentations & Demos for Projects

 

Course Evaluation

Week 16 (Dec. 8)

Presentations & Demos for Projects

 

Preparation for Project Reports

 

Deadline for submitting the project report (Hard deadline: Dec. 10; only one member of each group submits to the Blackboard the project report, source code, data sets, presentation slides, and demos in a single zip package)

Week 17 (Dec. 13-19)

No Final Exam

 

 

Academic calendar: https://www.kent.edu/academic-calendar

Final exam schedule: https://www.kent.edu/registrar/fall-final-exam-schedule

NOTE: Presentation dates and deadlines are tentative. Exact dates will be announced in class!!!


Scoring and Grading

5% - Attendance & Questions

50% - 5 Homeworks (10 points each)

15% - Survey

o   A survey on papers for the selected research topics in recent database conferences/journals

30% - Research Projects & Presentations

o   Research project report (including introduction, related works, problem definition, solutions, experiments, and conclusions) (20%)

o   Presentation and demonstration for the proposed research project (10%)

5% - Bonus Points, rated by other team members

10% - (Optional) Bonus for presenting research papers

A = 90 or higher

B = 80 - 89

C = 70 - 79

D = 60 - 69

F = <60

 


 

Guidelines for Surveys/Papers/Projects

 

All surveys/papers/projects will be submitted electronically only. Instructions are given separately.

 

     Assignments must be submitted to Blackboard by the due date.

     A survey or paper report turned in within two weeks after the due date will be considered late and will lose 30% of its grade (10% for the first week, and 20% more for the second week).

     No assignment will be accepted for grading after two weeks late.

     The late submission needs prior consent of the instructor.


Lecture Attendance Policy

Attendance in the lecture is mandatory. Students are expected to attend lectures, study the text, and contribute to discussions. You need to write your name on attendance sheets throughout the course, so please attend every lecture.

Students are expected to attend all scheduled classes and may be dropped from the course for excessive absences. Legitimate reasons for an "excused" absence include, but are not limited to, illness and injury, disability-related concerns, military service, death in the immediate family, religious observance, academic field trips, and participation in an approved concert or athletic event, and direct participation in university disciplinary hearings.

Even though any absence can potentially interfere with the planned development of a course, and the student bears the responsibility for fulfilling all course requirements in a timely and responsible manner, instructors will, without prejudice, provide students returning to class after a legitimate absence with appropriate assistance and counsel about completing missed assignments and class material. Neither academic departments nor individual faculty members are required to waive essential or fundamental academic requirements of a course to accommodate student absences. However, each circumstance will be reviewed on a case-by-case basis.

For more details, please refer to University policy 3-01.2: http://www.kent.edu/policyreg/administrative-policy-regarding-class-attendance-and-class-absence.


Make-up Presentation Policy

No make-up presentation will be given except for university sanctioned excused absences. If you miss a presentation (for a good reason), it is your responsibility to contact me before the presentation, or soon after the presentation as possible.


Academic Dishonesty Policy

The University expects a student to maintain a high standard of individual honor in his/her scholastic work. Unless otherwise required, each student is expected to complete his or her assignment individually and independently (even in the team, workload should be distributed to team members to accomplish individually). Although it is encouraged to study together, the work handed in for grading by each student is expected to be his or her own. Any form of academic dishonesty will be strictly forbidden and will be punished to the maximum extent. Copying an assignment from another student (team) in this class or obtaining a solution from some other source will lead to an automatic failure for this course and to a disciplinary action. Allowing another student to copy one's work will be treated as an act of academic dishonesty, leading to the same penalty as copying.

University policy 3-01.8 deals with the problem of academic dishonesty, cheating, and plagiarism. None of these will be tolerated in this class. The sanctions provided in this policy will be used to deal with any violations. If you have any questions, please read the policy at http://www.kent.edu/policyreg/administrative-policy-regarding-student-cheating-and-plagiarism and/or ask.


Students with Disabilities

University policy 3-01.3 requires that students with disabilities be provided reasonable accommodations to ensure their equal access to course content. If you have a documented disability and require accommodations, please contact the instructor at the beginning of the semester to make arrangements for necessary classroom adjustments. Please note, you must first verify your eligibility for these through Student Accessibility Services (contact 330-672-3391 or visit www.kent.edu/sas for more information on registration procedures).


Statements for the Course

This course may be used to satisfy the University Diversity requirement. Diversity courses provide opportunities for students to learn about such matters as the history, culture, values and notable achievements of people other than those of their own national origin, ethnicity, religion, sexual orientation, age, gender, physical and mental ability, and social class. Diversity courses also provide opportunities to examine problems and issues that may arise from differences, and opportunities to learn how to deal constructively with them.

 

This course may be used to satisfy the Writing Intensive Course (WIC) requirement. The purpose of a writing-intensive course is to assist students in becoming effective writers within their major discipline. A WIC requires a substantial amount of writing, provides opportunities for guided revision, and focuses on writing forms and standards used in the professional life of the discipline.

 

This course may be used to fulfill the university's Experiential Learning Requirement (ELR) which provides students with the opportunity to initiate lifelong learning through the development and application of academic knowledge and skills in new or different settings. Experiential learning can occur through civic engagement, creative and artistic activities, practical experiences, research, and study abroad/away.

 


 

Disclaimer

The instructor reserves the right to alter this syllabus as necessary.