CSDS 600: Special Topics - AI for Databases (AI4DB)
Fall 2025
Instructor: Xiang Lian
Office
Location: Olin 706
Web: http://www.cs.kent.edu/~xlian/index.html
Email: xxl1584@case.edu
Course:
CSDS 600: Special
Topics - AI for Databases (AI4DB)
Section: 102
Class Nbr: 13379
Prerequisites: None
Time:
10am-11:15am,
Tuesday and Thursday
Classroom
Location: Glennan
400
Course
Webpage: http://www.cs.kent.edu/~xlian/2025Fall_CSDS600_AI4DB.html
Instructor's
Office Hours: 9am-10am, 11:15am-12:30pm, Tuesday
and Thursday; or any other convenient time for both you and the instructor by
email appointment (xxl1584@case.edu)
Textbook, References, and Online
Resources
This
course does not require any textbook, but you are required to read many
research papers from the latest database/AI conferences or journals. Most
papers are available through the digital library or on the Internet.
Online resources of research papers/surveys,
including database conferences/journals (e.g., SIGMOD, PVLDB, ICDE, TODS,
VLDBJ, TKDE, etc.) and AI conferences/journals (e.g., ICLR, NeurIPS, ICML,
CVPR, AAAI, IJCAI, etc.).
o
ACM
Transactions on Database Systems (TODS): http://dblp.uni-trier.de/db/journals/tods/index.html
o
The
International Journal on Very Large Data Bases (VLDBJ): http://dblp.uni-trier.de/db/journals/vldb/
o
IEEE
Transactions on Knowledge and Data Engineering (TKDE): http://dblp.uni-trier.de/db/journals/tkde/index.html
o
The
ACM Special Interest Group on Management of Data (SIGMOD): http://dblp.uni-trier.de/db/conf/sigmod/
o Proceedings of the VLDB Endowment (PVLDB): http://www.vldb.org/pvldb/ or http://dblp.uni-trier.de/db/journals/pvldb/index.html
o IEEE International Conference on Data Engineering (ICDE): http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000178 or http://dblp.uni-trier.de/db/conf/icde/
o
International
Conference on Learning Representations (ICLR): https://dblp.org/db/conf/iclr/index.html
o
Neural
Information Processing Systems (NeurIPS): https://dblp.org/db/conf/nips/index.html
o
International
Conference on Machine Learning (ICML): https://dblp.org/db/conf/icml/index.html
o
IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR): https://dblp.org/db/conf/cvpr/index.html
o
The
Association for the Advancement of Artificial Intelligence (AAAI): https://aaai.org/ or https://dblp.org/db/conf/aaai/index.html
o
International
Joint Conference on Artificial Intelligence (IJCAI): https://dblp.org/db/conf/ijcai/index.html
o
ACM
Computing Surveys: https://dl.acm.org/journal/csur
o
Samples
of surveys:
❖
Indexing:
https://www.slac.stanford.edu/pubs/slacpubs/16250/slac-pub-16460.pdf
❖
A survey of probabilistic data management:
http://ieeexplore.ieee.org/document/4597041/
❖
A
Survey of Large-Scale Analytical Query Processing in MapReduce: http://link.springer.com/article/10.1007/s00778-013-0319-9
❖
A
Survey on Parallel and Distributed Data Warehouses: https://pdfs.semanticscholar.org/4f3e/d0d4dfbd0bf4648a7feda94e3176e33ad088.pdf
o
AI/DB/AI4DB/DB4AI surveys:
❖
Survey
of Vector Database Management Systems
James
Jie Pan, Jianguo Wang, Guoliang Li
ArXiv URL: https://arxiv.org/abs/2310.14021
❖
A
Survey of Graph Meets Large Language Model: Progress and Future Directions
Yuhan Li, Zhixun Li, Peisong Wang, Jia Li, Xiangguo
Sun, Hong Cheng, Jeffrey Xu Yu
ArXiv URL: https://arxiv.org/abs/2311.12399
❖
How
good are multi-dimensional learned indexes? An experimental survey
Qiyu
Liu, Maocheng Li, Yuxiang Zeng, Yanyan Shen, Lei Chen
The
VLDB Journal, 34(2), 2025
URL:
https://arxiv.org/abs/2405.05536
❖
Learned
Index: A Comprehensive Experimental Evaluation
Zhaoyan Sun, Xuanhe Zhou, and Guoliang Li
PVLDB, 2023
URL: https://www.vldb.org/pvldb/vol16/p1992-li.pdf
❖
A
Survey of Learned Indexes for the Multi-dimensional Space
Abdullah Al-Mamun, Hao Wu, Qiyang He, Jianguo Wang, and
Walid G. Aref
ArXiv, 2024.
URL: https://arxiv.org/abs/2403.06456
❖
Database meets deep learning: Challenges and
opportunities.
W. Wang, M. Zhang, G. Chen, H. V.
Jagadish, and B. C. O. et al.
SIGMOD Rec., 2016.
URL: https://arxiv.org/pdf/1906.08986
❖
Machine Unlearning: A Survey
Heng Xu, Tianqing Zhu, Lefeng Zhang,
Wanlei Zhou, Philip S. Yu
ACM Computing Surveys, 2023
URL: https://dl.acm.org/doi/pdf/10.1145/3603620
❖
Graph
Neural Networks for Databases: A Survey
Ziming Li, Youhuan Li, Yuyu Luo,
Guoliang Li, Chuxu Zhang
URL: https://arxiv.org/abs/2502.12908
❖
Database Meets Artificial Intelligence: A
Survey.
Xuanhe Zhou, Chengliang Chai,
Guoliang Li, Ji Sun
IEEE Trans. Knowl. Data Eng. 34(3):
1096-1116, 2022
URL: https://ieeexplore.ieee.org/document/9094012
❖
A
Survey of NL2SQL with Large Language Models: Where are we, and where are we
going?
Xinyu Liu, Shuyu Shen, Boyan Li,
Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, Yuyu Luo
URL: https://arxiv.org/abs/2408.05109
o
Data+AI: LLM4Data and Data4LLM (Tutorial)
Guoliang Li (Tsinghua University), Jiayi Wang (Tsinghua University),
Chenyang Zhang (Tsinghua University), Jiannan Wang (Huawei Technologies, Simon
Fraser University)
SIGMOD, 2025.
URL: https://dl.acm.org/doi/pdf/10.1145/3722212.3725641
o
Machine Learning for Data Management: A System View
(Tutorial)
Guoliang Li, Xuanhe Zhou
ICDE 2022
URL: https://dbgroup.cs.tsinghua.edu.cn/ligl/papers/icde22-tutorial-paper.pdf
o
AI Meets Database: AI4DB and DB4AI (Tutorial)
Guoliang Li, Xuanhe Zhou, Lei Cao.
SIGMOD 2021
URL: https://dbgroup.cs.tsinghua.edu.cn/ligl/papers/sigmod21-tutorial-paper.pdf
o
Machine Learning for Databases (Tutorial)
Guoliang Li, Xuanhe Zhou, Lei Cao
VLDB 2021.
URL: https://dbgroup.cs.tsinghua.edu.cn/ligl/papers/vldb21-tutorial-paper.pdf
o
Ian
Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning, An MIT Press
book. URL: https://www.deeplearningbook.org/
o
Mathematics
of Deep Learning. URL: https://mml-book.github.io/book/mml-book.pdf
o
Dive
into Deep Learning. URL: https://d2l.ai/index.html
o
Datasets
and Source Code
❖
ImageNet: https://www.image-net.org/
❖
MNIST: https://www.kaggle.com/datasets/hojjatk/mnist-dataset
or https://en.wikipedia.org/wiki/MNIST_database
❖
Spatial
Data Sets and Index Source Code: http://chorochronos.datastories.org/
❖
Road
Network and Stream Data: https://www.cs.utah.edu/~lifeifei/datasets.html
❖ Intel
Lab Data: https://db.csail.mit.edu/labdata/labdata.html
❖ U.S.
Government's Open Data: https://www.data.gov/
❖ Stanford
Large Network Dataset Collection: https://snap.stanford.edu/data/
❖ DBpedia RDF Data: http://www.dbpedia.org
❖ Freebase RDF Data: https://developers.google.com/freebase/
❖
YAGO1,
YAGO2s, YAGO3 RDF Data: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/archive/ (YAGO2 paper: https://people.mpi-inf.mpg.de/~kberberi/publications/2010-mpii-tra.pdf)
o
Apache
Hadoop: http://hadoop.apache.org/
o
Amazon
AWS: https://aws.amazon.com/
o
LinkedIn
Learning (Video Tutorial): https://www.linkedin.com/learning/ (Sign in with the organization
portal)
o
Learning-based
Cardinality Estimation
o
Learning-based
Value Estimation
o
Learned
Index
o
Learning-based
Data Quality Improvement
o
Learning-based
Privacy Preserving
o
AI
for DB Systems
o
AI
for DB Configuration
o
AI
for DB Optimization
o
AI
for DB Design
o
AI
for DB Queries
o
AI
for Data Mining
o
Learning-based
Data (Graph) Generation
o
Explainable
AI for DB
o
Learning-based
Anomaly Detection
o
AI
for DB Applications
o
Machine
Unlearning for Databases
Catalog Description
The purpose of this course is to learn
the latest development of new database and AI techniques and investigate recent
trends of applying AI techniques to solve various database problems (i.e., AI
for DB problems). Nowadays, AI technology has been widely and successfully
applied to various real-world applications such as the computer vision, natural
language processing, robotics, healthcare, autonomous vehicles, financial
services, cybersecurity, and personalized recommendations. Recently, in the
database area, it is not very uncommon that many DB research papers have used
machine learning methods to the data science domain, which leads to newly
emerging tracks, such as AI for DB (AI4DB). Therefore, in this class, we will
cover major research topics on the latest AI/DB/AI4DB techniques, applications,
and systems, including, but not limited to, learned indexes, learning-based
cost/cardinality estimations, learning-based approaches for (graph) database
queries, learning-based approaches for anomaly detection, and so on. Students
are expected to do a survey on a selected research direction for papers from
recent AI/database journals/conferences, and write research papers or reports
with new problems or solutions. Students will also give presentations of
AI/DB/AI4DB research papers and outcomes of their own projects to the class. It
is also expected that the resulting surveys/papers can be extended to
conference/journal papers.
Learning Outcomes
At the end of this course, the students should be able to:
Tentative Schedule
Week |
Topic |
Notes1 |
Week 1 (Aug. 25-29) |
|
Please form study
groups, each with 2-4 members, and send your names and emails to me (xxl1584@case.edu); Due on
Sept. 5 |
Week 2 (Sept. 1-5) |
|
Sept. 1, Labor Day; No
classes |
Week 3 (Sept. 8-12) |
Chapter 3: AI4DB Problems,
Techniques, and Challenges |
DB Reading
Materials: Index (1)
(2) Homework 1 (Due on Sept. 12) |
Week 4 (Sept. 15-19) |
Chapter 4: Learned Index |
Deadline to submit a
reading list for the survey (Sept. 19, Friday) |
Week 5 (Sept. 22-26) |
Chapter 5: Learning-based
Cost/Cardinality Estimation |
Homework 2 (Due on Sept. 26) |
Week 6 (Sept. 29-Oct.
3) |
Chapter 6: Learning-based
Approaches for Database Queries |
|
Week 7 (Oct. 6-10) |
|
Project Report (template) |
Week 8 (Oct. 13-17) |
Chapter 6: Graph Neural Networks
(GNNs) |
Homework 3
(Due on Oct. 17) Deadline to submit the survey
(Oct. 17, Friday) |
Week 9 (Oct. 20-24) |
Chapter 7: GNN for Relational
Databases |
Oct. 20-21, Fall Break;
No classes |
Week 10 (Oct. 27-31) |
Chapter 8: GNN for Graphs |
Homework 4 (Due on Oct. 31) |
Week 11 (Nov. 3-7) |
Chapter 9: Learning-based
Approaches for Anomaly Detection |
Submission of
Sections 1-4 in Project Report Template (Deadline: Nov. 7) Last Day to Withdraw: 11/7/2025 |
Week 12 (Nov. 10-14) |
Project Q/A |
|
Week 13 (Nov. 17-21) |
Presentations
& Demos for Projects |
|
Week 14 (Nov. 24-28) |
Presentations
& Demos for Projects |
Nov. 27-28, Thanksgiving
Holidays; No classes |
Week 15 (Dec. 1-5) |
Presentations
& Demos for Projects |
|
Weeks 16-17 (Dec. 8-12, 15-17) |
No Final Exam |
Deadline for submitting the
project report (Hard
deadline: Dec. 8; only one
member of each group submits the project report, source code, data
sets, presentation slides, and demos in a single zip package) |
Academic
calendar: https://case.edu/registrar/dates-deadlines/academic-calendar
Final exam
schedule: https://case.edu/registrar/dates-deadlines/final-exam-schedule
NOTE: Presentation dates and deadlines
are tentative. Exact dates will be announced in class!!!
40% - 4 Homeworks (10 points each)
15% - Survey
o
A
survey on papers for the selected research topics in recent database/AI
conferences/journals
10% - Paper
Presentation [10-15 min]
o
Presentation
of one (1) AI and one (1) DB terminologies/problems/approaches (2%)
o
Presentation
of an AI4DB research paper (8%)
30% - Research
Project Presentation, Demonstration, and Project Report
o
Presentation
and demonstration for the proposed research project (10%) [15-20 min]
o
Research
project report (including introduction, related works, problem definition,
solutions, experiments, and conclusions) (20%)
5% - Rating
by other team members
5% - Bonus
presentation of any AI4DB paper [10-15 min]
A = 90 or higher
B = 80 - 89
C = 70 - 79
D = 60 - 69
F = <60
For homework assignments, please write down the intermediate
steps of your answers. Partial marks will be given for your intermediate steps,
even if the final answers are not correct.
Guidelines for
Surveys/Papers/Projects
All surveys/papers/projects will be
submitted electronically only. Instructions are given separately.
➢ Assignments must be submitted to Canvas by the due date.
➢ A survey or paper report turned in within two weeks after the due date will be considered late and will lose 30% of its grade (10% for the first week, and 20% more for the second week).
➢ No assignment will be accepted for grading after two weeks late.
➢ The late submission needs prior consent of the instructor.
Attendance in the lecture is
mandatory. Students are expected to attend lectures, study the text, and
contribute to discussions. You need to write your name on attendance sheets
throughout the course, so please attend every lecture.
Students are expected to attend all
scheduled classes and may be dropped from the course for excessive absences.
Legitimate reasons for an "excused" absence include, but are not
limited to, illness and injury, disability-related concerns, military service,
death in the immediate family, religious observance, academic field trips, and
participation in an approved concert or athletic event, and direct
participation in university disciplinary hearings.
Even though any absence can
potentially interfere with the planned development of a course, and the student
bears the responsibility for fulfilling all course requirements in a timely and
responsible manner, instructors will, without prejudice, provide students
returning to class after a legitimate absence with appropriate assistance and
counsel about completing missed assignments and class material. Neither
academic departments nor individual faculty members are required to waive
essential or fundamental academic requirements of a course to accommodate
student absences. However, each circumstance will be reviewed on a case-by-case
basis.
No make-up
presentation will be given except for university sanctioned excused absences. Feel
free to contact me (xxl1584@case.edu)
before the presentation, or soon after the presentation as possible.
The University expects a student to
maintain a high standard of individual honor in his/her scholastic work. Unless
otherwise required, each student is expected to complete his or her assignment
individually and independently (even in the team, workload should be
distributed to team members to accomplish individually). Although it is
encouraged to study together, the work handed in for grading by each student is
expected to be his or her own. Any form of academic dishonesty will be strictly
forbidden and will be punished to the maximum extent. Copying an assignment
from another student (team) in this class or obtaining a solution from some
other source will lead to an automatic failure for this course and to a
disciplinary action. Allowing another student to copy one's work will be
treated as an act of academic dishonesty, leading to the same penalty as
copying.
For more details, please refer to the
graduate studies academic honesty policy: https://case.edu/gradstudies/sites/default/files/2018-04/SGS-Academic-Integrity-Policies-and-Rules.pdf or the Policies & Procedures for graduate studies: https://case.edu/gradstudies/current-students/policies-procedures.
The
instructor reserves the right to alter this syllabus as necessary.