CS 63018 & CS 73018 Probabilistic Data Management
Fall 2024
Instructor: Xiang Lian
Office
Location: Mathematics and Computer Science
Building, Room 264
Office
Phone Number: (330) 672-9063
Web: http://www.cs.kent.edu/~xlian/index.html
Email: xlian@kent.edu
Course:
Probabilistic Data Management
CRN: 12358 & 12391
Prerequisites: Permission of the instructor
Time:
2:15pm - 3:30pm, MW
Classroom
Location: Room 107,
Merrill Hall
Course
Webpage: http://www.cs.kent.edu/~xlian/2024Fall_CS63018_CS73018.html
Instructor's
Office Hours: 9:30am - 12:00pm, MW; or any other
convenient time for both you and the instructor by email appointment (xlian@kent.edu)
Graduate
Assistant: Racheal Mukisa
Office:
TBA
E-mail:
rmukisa1@kent.edu
Phone:
N/A
TA's
Office Hours: TBA
For grading issues, please contact GA for clarifying the
details of the grading. Whenever you have any questions about the course
materials or homework/survey/project, please feel free to contact me by email (xlian@kent.edu) to schedule a meeting. You are also encouraged to post commonly-encountered questions/answers or resources on the
discussion board of Canvas which may benefit your peer classmates.
The official
registration deadline for this course is 08/25/2024. University policy requires all students to be
officially registered in each class they are attending. Students who are not
officially registered for a course by published deadlines should not be
attending classes and will not receive credit or a grade for the course. Each
student must confirm enrollment by checking his/her class schedule (using
Student Tools in FlashLine) prior to the deadline indicated. Registration errors must
be corrected prior to the deadline.
https://www.kent.edu/academic-calendar
For registration deadlines, enter the requested information
for a Detailed Class Search from the Schedule of Classes Search found at:
https://keys.kent.edu:44220/ePROD/bwlkffcs.P_AdvUnsecureCrseSearch?term_in=201680
After locating your course/section, click on the
Registration Deadlines link on the far right side of the listing.
Last
day to withdraw: 10/27/2024
Reference Books
This course
does not require any textbook, but there are several reference books below that
you can find online or borrow from the Kent State Library.
Charu C. Aggarwal.
Managing and Mining Uncertain Data. Springer Publishing Company, 2009. ISBN: 978-0-387-09689-6 (Print) 978-0-387-09690-2 (Online), https://link.springer.com/book/10.1007%2F978-0-387-09690-2
Lei Chen and
Xiang Lian. Query Processing over Uncertain Databases. In Synthesis Lectures on
Data Management, Vol. 4, No. 6, pages 1-101, Springer, 2012. ISBN:
9781608458929, https://link.springer.com/book/10.1007/978-3-031-01896-1
Dan Suciu,
Dan Olteanu, Christopher Re, and Christoph Koch. Probabilistic Databases. In
Synthesis Lectures on Data Management, Springer, 2011. ISBN-13: 978-1608456802,
ISBN-10: 1608456803, https://link.springer.com/book/10.1007/978-3-031-01879-4
Resources of Reading Materials
In this course, you need to read some research papers, and
most papers are available through the digital library at Kent State University.
You can access them either through networks on campus or install a VPN (GlobalProtect) at https://www.kent.edu/tusc/connecting-vpn for off-campus assesses.
Online resources of research papers/surveys, including
database conferences/journals (SIGMOD, PVLDB, ICDE, TODS, VLDBJ, and TKDE),
etc.
o
TODS:
http://dblp.uni-trier.de/db/journals/tods/index.html
o
VLDBJ:
http://dblp.uni-trier.de/db/journals/vldb/
o
TKDE:
http://dblp.uni-trier.de/db/journals/tkde/index.html
o
SIGMOD:
http://dblp.uni-trier.de/db/conf/sigmod/
o VLDB: http://www.vldb.org/pvldb/, or http://dblp.uni-trier.de/db/journals/pvldb/index.html
o ICDE: http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000178, or http://dblp.uni-trier.de/db/conf/icde/
o
Indexing:
https://www.slac.stanford.edu/pubs/slacpubs/16250/slac-pub-16460.pdf
o
A survey of probabilistic data management:
http://ieeexplore.ieee.org/document/4597041/
o
A
Survey of Large-Scale Analytical Query Processing in MapReduce: http://link.springer.com/article/10.1007/s00778-013-0319-9
o
A
Survey on Parallel and Distributed Data Warehouses: https://pdfs.semanticscholar.org/4f3e/d0d4dfbd0bf4648a7feda94e3176e33ad088.pdf
o
Datasets
and Source Code
❖
Spatial
data sets and index source code: http://chorochronos.datastories.org/
❖ Road network and stream data: https://www.cs.utah.edu/~lifeifei/datasets.html
❖ U.S.
Government's open data: https://www.data.gov/
❖ DBpedia RDF data: http://www.dbpedia.org
❖ Freebase RDF data: https://developers.google.com/freebase/
❖
YAGO1,
YAGO2s, YAGO3 RDF data: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/archive/ (YAGO2 paper: https://people.mpi-inf.mpg.de/~kberberi/publications/2010-mpii-tra.pdf)
o
Apache
Hadoop: http://hadoop.apache.org/
o
Amazon
AWS: https://aws.amazon.com/
o
Tutorial:
https://www.lynda.com/ (Sign in with the organization
portal)
A reading list is here ☺
Catalog Description
The purpose of this course is to
learn the fundamental concepts and techniques for probabilistic data management
in the area of databases. Probabilistic data are pervasive in many real-world
applications, such as sensor networks, GPS system, location-based services,
mobile computing, multimedia databases, data extraction/integration, trajectory
data analysis, Semantic Web, privacy preserving, and so on. It is rather
challenging how to efficiently and effectively manage these large-scale
probabilistic data. In this class, we will cover major research topics such as
probabilistic/uncertain data model, probabilistic queries, probabilistic query
answering techniques, data quality issues in databases, and so on. Students are expected to do a survey on a selected research direction
for papers from recent database journals/conferences, and write research papers
or reports with new problems or solutions. Students will also give
presentations to the class to demonstrate their outcomes. It is also expected
that the resulting surveys/papers can be extended to database
conference/journal papers.
Learning Outcomes
At the end of this course, the students should be able to:
Tentative Schedule
Week |
Topic |
Notes1 |
Week 1 (Aug. 19) |
Please form study
groups, each with 5-6 members, and send your names and emails to me (xlian@kent.edu); Due on
Aug. 28 |
|
Week 1 (Aug. 21) |
|
|
Week 2 (Aug. 26) |
|
|
Week 2 (Aug. 28) |
Probabilistic Query Answering Over Probabilistic/Uncertain
Databases (1) |
Homework 1 (Due on Sept. 11) |
Week 3 (Sept. 2) |
-- |
Labor Day; No classes |
Week 3 (Sept. 4) |
|
|
Week 4 (Sept. 9) |
Probabilistic Query Answering Over Probabilistic/Uncertain
Databases (2) |
|
Week 4 (Sept. 11) |
Probabilistic Query Answering Over Probabilistic/Uncertain
Databases (3) |
Homework 2
(Due on Sept. 25) |
Week 5 (Sept. 16) |
Probabilistic Query Answering Over Probabilistic/Uncertain
Databases (4) |
Reading
Materials: Index (1)
(2) Deadline to submit a
reading list for the survey (Sept. 18, Wednesday) |
Week 5 (Sept. 18) |
|
|
Week 6 (Sept. 23) |
Probabilistic Query Answering Over Probabilistic/Uncertain
Databases (5) |
|
Week 6 (Sept. 25) |
Probabilistic Query Answering Over Probabilistic/Uncertain
Databases (6) |
Homework 3 (Due on Oct. 16) |
Week 7 (Sept. 30) |
Q/A Session |
|
Week 7 (Oct. 2) |
Probabilistic Query Answering Over Probabilistic/Uncertain
Databases (7) |
|
Week 8 (Oct. 7) |
Q/A Session |
|
Week 8 (Oct. 9) |
Project Report (template) |
|
Week 9 (Oct. 14) |
Project Q/A |
|
Week 9 (Oct. 16) |
Homework 4
(Due on Oct. 30) Deadline to submit the survey
(Oct. 16, Wednesday) |
|
Week 10 (Oct. 21) |
Project Q/A |
|
Week 10 (Oct. 23) |
Last Day to Withdraw: 10/27/2024 |
|
Week 11 (Oct. 28) |
Project Q/A |
|
Week 11 (Oct. 30) |
Q/A Session |
Homework 5 (Due
on Nov. 13) |
Week 12 (Nov. 4) |
Project Q/A |
Submission of
Sections 1-4 in Project Report Template (Deadline: 11/6/2023) |
Week 12 (Nov. 6) |
Q/A Session |
|
Week 13 (Nov. 11) |
-- |
Veterans Day; No classes |
Week 13 (Nov. 13) |
Project Q/A |
|
Week 14 (Nov. 18) |
Presentations
& Demos for Projects Group #1 Group #2 Group #3 Group #4 Group #5 |
|
Week 14 (Nov. 20) |
Presentations
& Demos for Projects Group #6 Group #7 Group #8 Group #9 Group #10 |
|
Week 15 (Nov. 25) |
Presentations
& Demos for Projects Group #11 Group #12 Group #13 Group #14 Group #15 |
|
Week 15 (Nov. 27) |
-- |
Nov. 27 - Dec. 1, 2024,
Thanksgiving Break; No classes |
Week 16 (Dec. 2) |
-- (no class) |
Course Evaluation |
Week 16 (Dec. 4) |
Presentations
& Demos for Projects Group #16 Group #17 Group #18 Group #19 Group #20 Preparation
for Project Reports |
Deadline for submitting the
project report (Hard
deadline: Dec. 6; only one
member of each group submits to the Canvas the project report, source
code, data sets, presentation slides, and demos in a single zip package) |
Week 17 (Dec. 9-15) |
No Final Exam |
|
Academic
calendar: https://www.kent.edu/academic-calendar
Final exam
schedule: https://www.kent.edu/fbe-center?au=final-exam-schedule&x#important-dates
NOTE: Presentation dates and
deadlines are tentative. Exact dates will be announced in class!!!
5% - Attendance
50% - 5 Homeworks (10 points each)
15% - Survey
o
A
survey on papers for the selected research topics in recent database
conferences/journals
30% -
Research Projects & Presentations
o
Research
project report (including introduction, related works, problem definition,
solutions, experiments, and conclusions) (20%)
o
Presentation
and demonstration for the proposed research project (10%)
5% - Bonus
Points, rated by other team members
10% - (Optional) Bonus for presenting research papers
A = 90 or higher
B = 80 - 89
C = 70 - 79
D = 60 - 69
F = <60
For homework assignments, please write down the intermediate
steps of your answers. Partial marks will be given for your intermediate steps,
even if the final answers are not correct.
Guidelines for
Surveys/Papers/Projects
All surveys/papers/projects will be
submitted electronically only. Instructions are given separately.
➢ Assignments must be submitted to Canvas by the due date.
➢ A survey or paper report turned in within two weeks after the due date will be considered late and will lose 30% of its grade (10% for the first week, and 20% more for the second week).
➢ No assignment will be accepted for grading after two weeks late.
➢ The late submission needs prior consent of the instructor.
Attendance in the lecture is
mandatory. Students are expected to attend lectures, study the text, and
contribute to discussions. You need to write your name on attendance sheets
throughout the course, so please attend every lecture.
Students are expected to attend all
scheduled classes and may be dropped from the course for excessive absences.
Legitimate reasons for an "excused" absence include, but are not
limited to, illness and injury, disability-related concerns, military service,
death in the immediate family, religious observance, academic field trips, and
participation in an approved concert or athletic event, and direct
participation in university disciplinary hearings.
Even though any absence can
potentially interfere with the planned development of a course, and the student
bears the responsibility for fulfilling all course requirements in a timely and
responsible manner, instructors will, without prejudice, provide students
returning to class after a legitimate absence with appropriate assistance and
counsel about completing missed assignments and class material. Neither
academic departments nor individual faculty members are required to waive
essential or fundamental academic requirements of a course to accommodate student
absences. However, each circumstance will be reviewed on a case-by-case basis.
For more details, please refer to
University policy 3-01.2: http://www.kent.edu/policyreg/administrative-policy-regarding-class-attendance-and-class-absence.
No make-up
presentation will be given except for university sanctioned excused absences. Feel
free to contact me (xlian@kent.edu)
before the presentation, or soon after the presentation as possible.
The University expects a student to
maintain a high standard of individual honor in his/her scholastic work. Unless
otherwise required, each student is expected to complete his or her assignment
individually and independently (even in the team, workload should be
distributed to team members to accomplish individually). Although it is
encouraged to study together, the work handed in for grading by each student is
expected to be his or her own. Any form of academic dishonesty will be strictly
forbidden and will be punished to the maximum extent. Copying an assignment
from another student (team) in this class or obtaining a solution from some
other source will lead to an automatic failure for this course and to a
disciplinary action. Allowing another student to copy one's work will be
treated as an act of academic dishonesty, leading to the same penalty as copying.
University policy 3-01.8 deals with
the problem of academic dishonesty, cheating, and plagiarism. None of these
will be tolerated in this class. The sanctions provided in this policy will be
used to deal with any violations. If you have any questions, please read the
policy at http://www.kent.edu/policyreg/administrative-policy-regarding-student-cheating-and-plagiarism and/or ask.
University Policy 3342-3-01.3
requires that students with disabilities be provided
reasonable accommodations to ensure their equal access to course content. If
you have a documented disability and require accommodations, please contact the
instructor at the beginning of the semester to make
arrangements for necessary classroom adjustments. Please note, you must
first verify your eligibility for these through Student Accessibility Services (contact 330-672-3391 or visit www.kent.edu/sas for more information on registration procedures).
This course may be used to satisfy
the University Diversity requirement. Diversity courses provide opportunities
for students to learn about such matters as the history, culture, values and
notable achievements of people other than those of their own national origin,
ethnicity, religion, sexual orientation, age, gender, physical and mental
ability, and social class. Diversity courses also provide opportunities to
examine problems and issues that may arise from differences, and opportunities
to learn how to deal constructively with them.
This course may be used to satisfy
the Writing Intensive Course (WIC) requirement. The purpose of a writing-intensive
course is to assist students in becoming effective writers within their major
discipline. A WIC requires a substantial amount of writing, provides
opportunities for guided revision, and focuses on writing forms and standards
used in the professional life of the discipline.
This course may be used to fulfill
the university's Experiential Learning Requirement (ELR) which provides
students with the opportunity to initiate lifelong learning through the
development and application of academic knowledge and skills in new or
different settings. Experiential learning can occur through civic engagement,
creative and artistic activities, practical experiences, research, and study
abroad/away.
The University welcomes individuals
from all different faiths, philosophies, religious traditions, and other
systems of belief, and supports their respective practices. In compliance with
University policy and the Ohio Revised Code, the University permits students to
request class absences for up to three (3) days, per semester, in order to
participate in organized activities conducted under the auspices of a religious
denomination, church, or other religious or spiritual organization. Students
will not be penalized as a result of any of these
excused absences.
The request for excusal must be
made, in writing, during the first fourteen (14) days of the semester and
include the date(s) of each proposed absence or request for alternative
religious accommodation. The request must clearly state that the proposed
absence is to participate in religious activities. The request must also
provide the particular accommodation(s) you desire.
You will be
notified by me if your request is approved, or, if
it is approved with modification. I will work with you in an effort to arrange
a mutually agreeable alternative arrangement. For more information regarding
this Policy you may contact the Student Ombuds (ombuds@kent.edu).
Kent State recognizes many students face challenges and we are committed
to supporting your academic journey when you need help. Please check out
these resources to help as you build your support system:
·
What is
the first step I should take to get academic support for this class?
v Reach out to your instructor!
·
Where can
I get help from another student who earned a good grade in this class?
v Tutoring
·
Where can
I go if I need assistance with how to study and meet my academic goals?
·
Who can
review my writing and help me properly cite my work?
·
Where
should I go when I don’t know where to go?
v TRIO Student Support Services
v There may be additional resources, just ask.
Kent State University is committed to the creation and maintenance of
equitable and inclusive learning spaces. This course is a learning environment
where all will be treated with respect and dignity, and where all individuals
will have an equitable opportunity to succeed. The diversity that each student
brings to this course is viewed as a strength and a benefit. Dimensions of
diversity and their intersections include but are not limited to: race, ethnicity, national origin, primary language, age,
gender identity and expression, sexual orientation, religious affiliation,
mental and physical abilities, socio-economic status, family/caregiver status,
and veteran status.
The
instructor reserves the right to alter this syllabus as necessary.