CS 63018 & CS 73018 Probabilistic Data Management

Fall 2023

 

Instructor: Xiang Lian

Office Location: Mathematics and Computer Science Building, Room 264

Office Phone Number: (330) 672-9063

Web: http://www.cs.kent.edu/~xlian/index.html

Email: xlian@kent.edu

Course: Probabilistic Data Management

CRN: 12333 & 12364

Prerequisites: Permission of the instructor

Time: 2:15pm - 3:30pm, MW

Classroom Location: Room 107, Merrill Hall

Course Webpage: http://www.cs.kent.edu/~xlian/course_archive/2023Fall_CS63018_CS73018.html

 

Instructor's Office Hours: 9:30am - 12:00pm, MW; or any other convenient time for both you and the instructor by email appointment (xlian@kent.edu)

 

Graduate Assistant: Racheal Mukisa

Office: TBA

E-mail: rmukisa1@kent.edu

Phone: N/A

TA's Office Hours: TBA

 

For grading issues, please contact GA for clarifying the details of the grading. Whenever you have any questions about the course materials or homework/survey/project, please feel free to contact me by email (xlian@kent.edu) to schedule a meeting. You are also encouraged to post commonly-encountered questions/answers or resources on the discussion board of Canvas which may benefit your peer classmates.


Enrollment/Official Registration of this Class

The official registration deadline for this course is 08/27/2023. University policy requires all students to be officially registered in each class they are attending. Students who are not officially registered for a course by published deadlines should not be attending classes and will not receive credit or a grade for the course. Each student must confirm enrollment by checking his/her class schedule (using Student Tools in FlashLine) prior to the deadline indicated. Registration errors must be corrected prior to the deadline.

https://www.kent.edu/academic-calendar

 

For registration deadlines, enter the requested information for a Detailed Class Search from the Schedule of Classes Search found at:

https://keys.kent.edu:44220/ePROD/bwlkffcs.P_AdvUnsecureCrseSearch?term_in=201680

 

After locating your course/section, click on the Registration Deadlines link on the far right side of the listing.

 

Last day to withdraw: 10/29/2023

 


Reference Books

This course does not require any textbook, but there are several reference books below that you can find online or borrow from the Kent State Library.

Charu C. Aggarwal. Managing and Mining Uncertain Data. Springer Publishing Company, 2009. ISBN: 978-0-387-09689-6 (Print) 978-0-387-09690-2 (Online), https://link.springer.com/book/10.1007%2F978-0-387-09690-2

Lei Chen and Xiang Lian. Query Processing over Uncertain Databases. In Synthesis Lectures on Data Management, Vol. 4, No. 6, pages 1-101, Springer, 2012. ISBN: 9781608458929, https://link.springer.com/book/10.1007/978-3-031-01896-1

Dan Suciu, Dan Olteanu, Christopher Re, and Christoph Koch. Probabilistic Databases. In Synthesis Lectures on Data Management, Springer, 2011. ISBN-13: 978-1608456802, ISBN-10: 1608456803, https://link.springer.com/book/10.1007/978-3-031-01879-4

 

Resources of Reading Materials

In this course, you need to read some research papers, and most papers are available through the digital library at Kent State University. You can access them either through networks on campus or install a VPN (GlobalProtect) at https://www.kent.edu/tusc/connecting-vpn for off-campus assesses.

 

Online resources of research papers/surveys, including database conferences/journals (SIGMOD, PVLDB, ICDE, TODS, VLDBJ, and TKDE), etc.

 

o   TODS: http://dblp.uni-trier.de/db/journals/tods/index.html

o   VLDBJ: http://dblp.uni-trier.de/db/journals/vldb/

o   TKDE: http://dblp.uni-trier.de/db/journals/tkde/index.html

o   SIGMOD: http://dblp.uni-trier.de/db/conf/sigmod/

o   VLDB: http://www.vldb.org/pvldb/, or http://dblp.uni-trier.de/db/journals/pvldb/index.html

o   ICDE: http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000178, or http://dblp.uni-trier.de/db/conf/icde/

o   http://csur.acm.org/

o   Indexing: https://www.slac.stanford.edu/pubs/slacpubs/16250/slac-pub-16460.pdf

o   A survey of probabilistic data management: http://ieeexplore.ieee.org/document/4597041/

o   A Survey of Large-Scale Analytical Query Processing in MapReduce: http://link.springer.com/article/10.1007/s00778-013-0319-9

o   A Survey on Parallel and Distributed Data Warehouses: https://pdfs.semanticscholar.org/4f3e/d0d4dfbd0bf4648a7feda94e3176e33ad088.pdf

o   Datasets and Source Code

    Spatial data sets and index source code: http://chorochronos.datastories.org/

    Road network and stream data: https://www.cs.utah.edu/~lifeifei/datasets.html

    U.S. Government's open data: https://www.data.gov/

    DBpedia RDF data: http://www.dbpedia.org

    Freebase RDF data: https://developers.google.com/freebase/

    YAGO1, YAGO2s, YAGO3 RDF data: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/archive/ (YAGO2 paper: https://people.mpi-inf.mpg.de/~kberberi/publications/2010-mpii-tra.pdf)

o   Apache Hadoop: http://hadoop.apache.org/

o   Amazon AWS: https://aws.amazon.com/

o   Tutorial: https://www.lynda.com/ (Sign in with the organization portal)

 

A reading list is here

 


 

Catalog Description

The purpose of this course is to learn the fundamental concepts and techniques for probabilistic data management in the area of databases. Probabilistic data are pervasive in many real-world applications, such as sensor networks, GPS system, location-based services, mobile computing, multimedia databases, data extraction/integration, trajectory data analysis, Semantic Web, privacy preserving, and so on. It is rather challenging how to efficiently and effectively manage these large-scale probabilistic data. In this class, we will cover major research topics such as probabilistic/uncertain data model, probabilistic queries, probabilistic query answering techniques, data quality issues in databases, and so on. Students are expected to do a survey on a selected research direction for papers from recent database journals/conferences, and write research papers or reports with new problems or solutions. Students will also give presentations to the class to demonstrate their outcomes. It is also expected that the resulting surveys/papers can be extended to database conference/journal papers.

Learning Outcomes

At the end of this course, the students should be able to:

  1. Explain real applications of probabilistic and uncertain data management in databases.
  2. Know the classifications of data uncertainties according to different criteria.
  3. Explain the causes and importance of studying probabilistic data management.
  4. Describe data uncertainty models, possible worlds semantics, correlations in probabilistic data, and probabilistic graph models.
  5. Know various types of probabilistic queries in probabilistic/uncertain databases.
  6. Describe the models, problem definitions, and the proposed techniques for each probabilistic query type in the literature.
  7. Learn to read/write research papers, and understand the general trend of the research in probabilistic data management.
  8. Summarize and analyze the pros and cons of existing works in probabilistic/uncertain databases.
  9. Identify one or two future directions in probabilistic databases, which have not been studied before, or not been extensively studied before, to work on.
  10. Write a survey on related works of probabilistic data management.
  11. Propose new solutions to existing problems or novel solutions to new problems in probabilistic and uncertain data management.
  12. Write a research report or research project/paper on the proposed problems or solutions.
  13. Do experiments on the proposed ideas in probabilistic data management.
  14. Give a presentation on the project report to show off the outcome of the research project. Optionally, give a presentation on the research papers in the literature (with bonus points).
  15. Work in a team (each with 4-5 members) to collaboratively write the survey and research papers.

 

 


 

Tentative Schedule

Week

Topic

Notes1

Week 1 (Aug. 21)

Introduction

Please form study groups, each with 4-5 members, and send your names and emails to me (xlian@kent.edu); Due on Aug. 30

Week 1 (Aug. 23)

An Overview of Probabilistic Data Management

 

Week 2 (Aug. 28)

Data Uncertainty Model

 

Week 2 (Aug. 30)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (1)

Homework 1 (Due on Sept. 13)

Week 3 (Sept. 4)

--

Labor Day; No classes

Week 3 (Sept. 6)

 

 

Week 4 (Sept. 11)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (2)

 

Week 4 (Sept. 13)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (3)

Homework 2 (Due on Sept. 27)

Week 5 (Sept. 18)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (4)

Reading Materials: Index (1) (2)

 

Deadline to submit a reading list for the survey (Sept. 18, Monday)

 

Week 5 (Sept. 20)

 

 

Week 6 (Sept. 25)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (5)

 

Week 6 (Sept. 27)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (6)

Homework 3 (Due on Oct. 18)

Week 7 (Oct. 2)

Q/A Session

 

Week 7 (Oct. 4)

Probabilistic Query Answering Over Probabilistic/Uncertain Databases (7)

 

Week 8 (Oct. 9)

Q/A Session

 

Week 8 (Oct. 11)

Probabilistic Graph Databases

 

Project Report (template)

 

Week 9 (Oct. 16)

Project Q/A

 

Week 9 (Oct. 18)

Data Quality in Probabilistic Databases (1)

Homework 4 (Due on Nov. 1)

 

Deadline to submit the survey (Oct. 18, Wednesday)

 

Week 10 (Oct. 23)

Project Q/A

 

Week 10 (Oct. 25)

Data Quality in Probabilistic Databases (2)

Last Day to Withdraw: 10/29/2023

 

Week 11 (Oct. 30)

Project Q/A

 

Week 11 (Nov. 1)

Q/A Session

Homework 5 (Due on Nov. 15)

Week 12 (Nov. 6)

Project Q/A

Submission of Sections 1-4 in Project Report Template (Deadline: 11/6/2023)

Week 12 (Nov. 8)

Q/A Session

 

Week 13 (Nov. 13)

Project Q/A

 

Week 13 (Nov. 15)

Project Q/A

 

Week 14 (Nov. 20)

Presentations & Demos for Projects

Group #8

Group #1

Group #2

Group #3

Group #4

 

 

Week 14 (Nov. 22)

--

Nov. 22 - 26, 2023, Thanksgiving Break; No classes

Week 15 (Nov. 27)

Presentations & Demos for Projects

Group #6

Group #7

Group #9

Group #11

 

 

Week 15 (Nov. 29)

Presentations & Demos for Projects

Group #5

Group #10

Group #12

Group #13

Group #19

 

 

 

Week 16 (Dec. 4)

Presentations & Demos for Projects

Group #14

Group #15

Group #16

Group #17

Group #18

 

Course Evaluation

Week 16 (Dec. 6)

Presentations & Demos for Projects

Group #20

Group #21

Group #22

Preparation for Project Reports

 

Deadline for submitting the project report (Hard deadline: Dec. 8; only one member of each group submits to the Canvas the project report, source code, data sets, presentation slides, and demos in a single zip package)

Week 17 (Dec. 11-17)

No Final Exam

 

 

Academic calendar: https://www.kent.edu/academic-calendar

Final exam schedule: https://www.kent.edu/registrar/fall-final-exam-schedule

NOTE: Presentation dates and deadlines are tentative. Exact dates will be announced in class!!!


Scoring and Grading

50% - 5 Homeworks (10 points each)

20% - Survey

o   A survey on papers for the selected research topics in recent database conferences/journals

30% - Research Projects & Presentations

o   Research project report (including introduction, related works, problem definition, solutions, experiments, and conclusions) (20%)

o   Presentation and demonstration for the proposed research project (10%)

5% - Bonus Points, rated by other team members

10% - (Optional) Bonus for presenting research papers

A = 90 or higher

B = 80 - 89

C = 70 - 79

D = 60 - 69

F = <60

 

For homework assignments, please write down the intermediate steps of your answers. Partial marks will be given for your intermediate steps, even if the final answers are not correct.

 


 

Guidelines for Surveys/Papers/Projects

 

All surveys/papers/projects will be submitted electronically only. Instructions are given separately.

 

    Assignments must be submitted to Canvas by the due date.

    A survey or paper report turned in within two weeks after the due date will be considered late and will lose 30% of its grade (10% for the first week, and 20% more for the second week).

    No assignment will be accepted for grading after two weeks late.

    The late submission needs prior consent of the instructor.


Lecture Attendance Policy

Attendance in the lecture is mandatory. Students are expected to attend lectures, study the text, and contribute to discussions. You need to write your name on attendance sheets throughout the course, so please attend every lecture.

Students are expected to attend all scheduled classes and may be dropped from the course for excessive absences. Legitimate reasons for an "excused" absence include, but are not limited to, illness and injury, disability-related concerns, military service, death in the immediate family, religious observance, academic field trips, and participation in an approved concert or athletic event, and direct participation in university disciplinary hearings.

Even though any absence can potentially interfere with the planned development of a course, and the student bears the responsibility for fulfilling all course requirements in a timely and responsible manner, instructors will, without prejudice, provide students returning to class after a legitimate absence with appropriate assistance and counsel about completing missed assignments and class material. Neither academic departments nor individual faculty members are required to waive essential or fundamental academic requirements of a course to accommodate student absences. However, each circumstance will be reviewed on a case-by-case basis.

For more details, please refer to University policy 3-01.2: http://www.kent.edu/policyreg/administrative-policy-regarding-class-attendance-and-class-absence.


Make-up Presentation Policy

No make-up presentation will be given except for university sanctioned excused absences. Feel free to contact me (xlian@kent.edu) before the presentation, or soon after the presentation as possible.

 


Academic Dishonesty Policy

The University expects a student to maintain a high standard of individual honor in his/her scholastic work. Unless otherwise required, each student is expected to complete his or her assignment individually and independently (even in the team, workload should be distributed to team members to accomplish individually). Although it is encouraged to study together, the work handed in for grading by each student is expected to be his or her own. Any form of academic dishonesty will be strictly forbidden and will be punished to the maximum extent. Copying an assignment from another student (team) in this class or obtaining a solution from some other source will lead to an automatic failure for this course and to a disciplinary action. Allowing another student to copy one's work will be treated as an act of academic dishonesty, leading to the same penalty as copying.

University policy 3-01.8 deals with the problem of academic dishonesty, cheating, and plagiarism. None of these will be tolerated in this class. The sanctions provided in this policy will be used to deal with any violations. If you have any questions, please read the policy at http://www.kent.edu/policyreg/administrative-policy-regarding-student-cheating-and-plagiarism and/or ask.


Students with Disabilities

University Policy 3342-3-01.3 requires that students with disabilities be provided reasonable accommodations to ensure their equal access to course content. If you have a documented disability and require accommodations, please contact the instructor at the beginning of the semester to make arrangements for necessary classroom adjustments. Please note, you must first verify your eligibility for these through Student Accessibility Services (contact 330-672-3391 or visit www.kent.edu/sas for more information on registration procedures).


Statements for the Course

This course may be used to satisfy the University Diversity requirement. Diversity courses provide opportunities for students to learn about such matters as the history, culture, values and notable achievements of people other than those of their own national origin, ethnicity, religion, sexual orientation, age, gender, physical and mental ability, and social class. Diversity courses also provide opportunities to examine problems and issues that may arise from differences, and opportunities to learn how to deal constructively with them.

 

This course may be used to satisfy the Writing Intensive Course (WIC) requirement. The purpose of a writing-intensive course is to assist students in becoming effective writers within their major discipline. A WIC requires a substantial amount of writing, provides opportunities for guided revision, and focuses on writing forms and standards used in the professional life of the discipline.

 

This course may be used to fulfill the university's Experiential Learning Requirement (ELR) which provides students with the opportunity to initiate lifelong learning through the development and application of academic knowledge and skills in new or different settings. Experiential learning can occur through civic engagement, creative and artistic activities, practical experiences, research, and study abroad/away.

 


 

Request for Religious Accommodations

The University welcomes individuals from all different faiths, philosophies, religious traditions, and other systems of belief, and supports their respective practices. In compliance with University policy and the Ohio Revised Code, the University permits students to request class absences for up to three (3) days, per semester, in order to participate in organized activities conducted under the auspices of a religious denomination, church, or other religious or spiritual organization. Students will not be penalized as a result of any of these excused absences.

 

The request for excusal must be made, in writing, during the first fourteen (14) days of the semester and include the date(s) of each proposed absence or request for alternative religious accommodation. The request must clearly state that the proposed absence is to participate in religious activities. The request must also provide the particular accommodation(s) you desire.

 

You will be notified by me if your request is approved, or, if it is approved with modification. I will work with you in an effort to arrange a mutually agreeable alternative arrangement. For more information regarding this Policy you may contact the Student Ombuds (ombuds@kent.edu).

 

Academic Support

Kent State recognizes many students face challenges and we are committed to supporting your academic journey when you need help.  Please check out these resources to help as you build your support system:

·       What is the first step I should take to get academic support for this class?

v  Reach out to your instructor!

·       Where can I get help from another student who earned a good grade in this class?

v  Tutoring

·       Where can I go if I need assistance with how to study and meet my academic goals?

v  Academic Coaching

·       Who can review my writing and help me properly cite my work?

v  Writing Commons

·       Where should I go when I don’t know where to go?

v  Academic Advising

v  TRIO Student Support Services

v  There may be additional resources, just ask.

 

Diversity, Equity, and Inclusion

Kent State University is committed to the creation and maintenance of equitable and inclusive learning spaces. This course is a learning environment where all will be treated with respect and dignity, and where all individuals will have an equitable opportunity to succeed. The diversity that each student brings to this course is viewed as a strength and a benefit. Dimensions of diversity and their intersections include but are not limited to: race, ethnicity, national origin, primary language, age, gender identity and expression, sexual orientation, religious affiliation, mental and physical abilities, socio-economic status, family/caregiver status, and veteran status.

 


 

Disclaimer

The instructor reserves the right to alter this syllabus as necessary.