CSDS 600: Special Topics - AI for Databases (AI4DB)

Fall 2025

 

Instructor: Xiang Lian

Office Location: Olin 706

Web: http://www.cs.kent.edu/~xlian/index.html

Email: xxl1584@case.edu

Course: CSDS 600: Special Topics - AI for Databases (AI4DB)

Section: 102

Class Nbr: 13379

Prerequisites: None

Time: 10am-11:15am, Tuesday and Thursday

Classroom Location: Glennan 400

Course Webpage: http://www.cs.kent.edu/~xlian/2025Fall_CSDS600_AI4DB.html

 

Instructor's Office Hours: 9am-10am, 11:15am-12:30pm, Tuesday and Thursday; or any other convenient time for both you and the instructor by email appointment (xxl1584@case.edu)

 


Textbook, References, and Online Resources

This course does not require any textbook, but you are required to read many research papers from the latest database/AI conferences or journals. Most papers are available through the digital library or on the Internet.

Online resources of research papers/surveys, including database conferences/journals (e.g., SIGMOD, PVLDB, ICDE, TODS, VLDBJ, TKDE, etc.) and AI conferences/journals (e.g., ICLR, NeurIPS, ICML, CVPR, AAAI, IJCAI, etc.).  

 

o   ACM Transactions on Database Systems (TODS): http://dblp.uni-trier.de/db/journals/tods/index.html

o   The International Journal on Very Large Data Bases (VLDBJ): http://dblp.uni-trier.de/db/journals/vldb/

o   IEEE Transactions on Knowledge and Data Engineering (TKDE): http://dblp.uni-trier.de/db/journals/tkde/index.html

o   The ACM Special Interest Group on Management of Data (SIGMOD): http://dblp.uni-trier.de/db/conf/sigmod/

o   Proceedings of the VLDB Endowment (PVLDB): http://www.vldb.org/pvldb/ or http://dblp.uni-trier.de/db/journals/pvldb/index.html

o   IEEE International Conference on Data Engineering (ICDE): http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000178 or http://dblp.uni-trier.de/db/conf/icde/

o   International Conference on Learning Representations (ICLR): https://dblp.org/db/conf/iclr/index.html

o   Neural Information Processing Systems (NeurIPS): https://dblp.org/db/conf/nips/index.html

o   International Conference on Machine Learning (ICML): https://dblp.org/db/conf/icml/index.html

o   IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): https://dblp.org/db/conf/cvpr/index.html

o   The Association for the Advancement of Artificial Intelligence (AAAI): https://aaai.org/ or https://dblp.org/db/conf/aaai/index.html

o   International Joint Conference on Artificial Intelligence (IJCAI): https://dblp.org/db/conf/ijcai/index.html

o   ACM Computing Surveys: https://dl.acm.org/journal/csur

o   Samples of surveys:

    Indexing: https://www.slac.stanford.edu/pubs/slacpubs/16250/slac-pub-16460.pdf

    A survey of probabilistic data management: http://ieeexplore.ieee.org/document/4597041/

    A Survey of Large-Scale Analytical Query Processing in MapReduce: http://link.springer.com/article/10.1007/s00778-013-0319-9

    A Survey on Parallel and Distributed Data Warehouses: https://pdfs.semanticscholar.org/4f3e/d0d4dfbd0bf4648a7feda94e3176e33ad088.pdf

o   AI/DB/AI4DB/DB4AI surveys:

    Survey of Vector Database Management Systems

James Jie Pan, Jianguo Wang, Guoliang Li

ArXiv URL: https://arxiv.org/abs/2310.14021

    A Survey of Graph Meets Large Language Model: Progress and Future Directions

Yuhan Li, Zhixun Li, Peisong Wang, Jia Li, Xiangguo Sun, Hong Cheng, Jeffrey Xu Yu

ArXiv URL: https://arxiv.org/abs/2311.12399

    How good are multi-dimensional learned indexes? An experimental survey

Qiyu Liu, Maocheng Li, Yuxiang Zeng, Yanyan Shen, Lei Chen

The VLDB Journal, 34(2), 2025

URL: https://arxiv.org/abs/2405.05536

    Learned Index: A Comprehensive Experimental Evaluation

Zhaoyan Sun, Xuanhe Zhou, and Guoliang Li

PVLDB, 2023

URL: https://www.vldb.org/pvldb/vol16/p1992-li.pdf

    A Survey of Learned Indexes for the Multi-dimensional Space

Abdullah Al-Mamun, Hao Wu, Qiyang He, Jianguo Wang, and Walid G. Aref

ArXiv, 2024.

URL: https://arxiv.org/abs/2403.06456

    Database meets deep learning: Challenges and opportunities.

W. Wang, M. Zhang, G. Chen, H. V. Jagadish, and B. C. O. et al.

SIGMOD Rec., 2016.

URL: https://arxiv.org/pdf/1906.08986

    Machine Unlearning: A Survey

Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, Philip S. Yu

ACM Computing Surveys, 2023

URL: https://dl.acm.org/doi/pdf/10.1145/3603620

    Graph Neural Networks for Databases: A Survey

Ziming Li, Youhuan Li, Yuyu Luo, Guoliang Li, Chuxu Zhang

URL: https://arxiv.org/abs/2502.12908

    Database Meets Artificial Intelligence: A Survey.

Xuanhe Zhou, Chengliang Chai, Guoliang Li, Ji Sun

IEEE Trans. Knowl. Data Eng. 34(3): 1096-1116, 2022

URL: https://ieeexplore.ieee.org/document/9094012

    A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?

Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, Yuyu Luo

URL: https://arxiv.org/abs/2408.05109

o   Data+AI: LLM4Data and Data4LLM (Tutorial)

Guoliang Li (Tsinghua University), Jiayi Wang (Tsinghua University), Chenyang Zhang (Tsinghua University), Jiannan Wang (Huawei Technologies, Simon Fraser University)

SIGMOD, 2025.

URL: https://dl.acm.org/doi/pdf/10.1145/3722212.3725641

o   Machine Learning for Data Management: A System View (Tutorial)

Guoliang Li, Xuanhe Zhou

ICDE 2022

URL: https://dbgroup.cs.tsinghua.edu.cn/ligl/papers/icde22-tutorial-paper.pdf

o   AI Meets Database: AI4DB and DB4AI (Tutorial)

Guoliang Li, Xuanhe Zhou, Lei Cao.

SIGMOD 2021

URL: https://dbgroup.cs.tsinghua.edu.cn/ligl/papers/sigmod21-tutorial-paper.pdf

o   Machine Learning for Databases (Tutorial)

Guoliang Li, Xuanhe Zhou, Lei Cao

VLDB 2021.

URL: https://dbgroup.cs.tsinghua.edu.cn/ligl/papers/vldb21-tutorial-paper.pdf

o   Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning, An MIT Press book. URL: https://www.deeplearningbook.org/

o   Mathematics of Deep Learning. URL: https://mml-book.github.io/book/mml-book.pdf

o   Dive into Deep Learning. URL: https://d2l.ai/index.html

o   Datasets and Source Code

    ImageNet: https://www.image-net.org/

    MNIST: https://www.kaggle.com/datasets/hojjatk/mnist-dataset or https://en.wikipedia.org/wiki/MNIST_database

    Spatial Data Sets and Index Source Code: http://chorochronos.datastories.org/

    Road Network and Stream Data: https://www.cs.utah.edu/~lifeifei/datasets.html

    Intel Lab Data: https://db.csail.mit.edu/labdata/labdata.html

    U.S. Government's Open Data: https://www.data.gov/

    Stanford Large Network Dataset Collection: https://snap.stanford.edu/data/

    DBpedia RDF Data: http://www.dbpedia.org

    Freebase RDF Data: https://developers.google.com/freebase/

    YAGO1, YAGO2s, YAGO3 RDF Data: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/archive/ (YAGO2 paper: https://people.mpi-inf.mpg.de/~kberberi/publications/2010-mpii-tra.pdf)

o   Apache Hadoop: http://hadoop.apache.org/

o   Amazon AWS: https://aws.amazon.com/

o   LinkedIn Learning (Video Tutorial): https://www.linkedin.com/learning/ (Sign in with the organization portal)

o   Learning-based Cardinality Estimation

o   Learning-based Value Estimation

o   Learned Index

o   Learning-based Data Quality Improvement

o   Learning-based Privacy Preserving

o   AI for DB Systems

o   AI for DB Configuration

o   AI for DB Optimization

o   AI for DB Design

o   AI for DB Queries

o   AI for Data Mining

o   Learning-based Data (Graph) Generation

o   Explainable AI for DB

o   Learning-based Anomaly Detection

o   AI for DB Applications

o   Machine Unlearning for Databases

 


 

Catalog Description

The purpose of this course is to learn the latest development of new database and AI techniques and investigate recent trends of applying AI techniques to solve various database problems (i.e., AI for DB problems). Nowadays, AI technology has been widely and successfully applied to various real-world applications such as the computer vision, natural language processing, robotics, healthcare, autonomous vehicles, financial services, cybersecurity, and personalized recommendations. Recently, in the database area, it is not very uncommon that many DB research papers have used machine learning methods to the data science domain, which leads to newly emerging tracks, such as AI for DB (AI4DB). Therefore, in this class, we will cover major research topics on the latest AI/DB/AI4DB techniques, applications, and systems, including, but not limited to, learned indexes, learning-based cost/cardinality estimations, learning-based approaches for (graph) database queries, learning-based approaches for anomaly detection, and so on. Students are expected to do a survey on a selected research direction for papers from recent AI/database journals/conferences, and write research papers or reports with new problems or solutions. Students will also give presentations of AI/DB/AI4DB research papers and outcomes of their own projects to the class. It is also expected that the resulting surveys/papers can be extended to conference/journal papers.

Learning Outcomes

At the end of this course, the students should be able to:

  1. Explain real-world applications of existing AI and DB problems.
  2. Know the advantages and disadvantages of AI and DB techniques.
  3. Explain the challenges of applying AI techniques to solve database problems.
  4. Describe the detailed structures, training, and testing (usage) of different variants of (graph) neural networks.
  5. Know the classification of different AI/DB/AI4DB problems and techniques.
  6. Learn to read/write surveys and research /papers, and understand the general trend of the research in AI4DB.
  7. Identify one or two future directions in AI4DB, which have not been studied before, or not been extensively studied before, to work on.
  8. Propose new solutions to existing AI4DB problems or novel solutions to new AI4DB problems.
  9. Learn to implement the proposed solutions using AI4DB techniques and evaluate the algorithms through experiments.
  10. Write a research report or research project/paper on the proposed problems or solutions.
  11. Give presentations on AI/DB/AI4DB papers and the course project to show off the outcomes.
  12. Work in a team (each with 2-4 members) to collaboratively write the survey and research papers.

 

 


 

Tentative Schedule

Week

Topic

Notes1

Week 1 (Aug. 25-29)

Chapter 1: Introduction

 

 

Please form study groups, each with 2-4 members, and send your names and emails to me (xxl1584@case.edu); Due on Sept. 5

 

Template for AI/DB Term or Approach

Week 2 (Sept. 1-5)

Chapter 2-1: DB Background

 

Sept. 1, Labor Day; No classes

Week 3 (Sept. 8-12)

Chapter 2-2: AI Background

 

Chapter 3: AI4DB Problems, Techniques, and Challenges

DB Reading Materials: Index (1) (2)

 

Homework 1 (Due on Sept. 12)

 

Week 4 (Sept. 15-19)

Chapter 4: Learned Index

 

Deadline to submit a reading list for the survey (Sept. 19, Friday)

 

Week 5 (Sept. 22-26)

Chapter 5: Learning-based Cost/Cardinality Estimation

Homework 2 (Due on Sept. 26)

Week 6 (Sept. 29-Oct. 3)

Chapter 6: Learning-based Approaches for Database Queries

 

Week 7 (Oct. 6-10)

 

Project Report (template)

 

Week 8 (Oct. 13-17)

Chapter 6: Graph Neural Networks (GNNs)

 

Homework 3 (Due on Oct. 17)

 

Deadline to submit the survey (Oct. 17, Friday)

 

Week 9 (Oct. 20-24)

Chapter 7: GNN for Relational Databases

 

 

Oct. 20-21, Fall Break; No classes

 

Week 10 (Oct. 27-31)

Chapter 8: GNN for Graphs

Homework 4 (Due on Oct. 31)

 

Week 11 (Nov. 3-7)

Chapter 9: Learning-based Approaches for Anomaly Detection

 

Submission of Sections 1-4 in Project Report Template (Deadline: Nov. 7)

 

Last Day to Withdraw: 11/7/2025

 

Week 12 (Nov. 10-14)

 

Project Q/A

 

 

Week 13 (Nov. 17-21)

Presentations & Demos for Projects

 

 

Week 14 (Nov. 24-28)

Presentations & Demos for Projects

 

Nov. 27-28, Thanksgiving Holidays; No classes

Week 15 (Dec. 1-5)

Presentations & Demos for Projects

 

 

Weeks 16-17 (Dec. 8-12, 15-17)

No Final Exam

Deadline for submitting the project report (Hard deadline: Dec. 8; only one member of each group submits the project report, source code, data sets, presentation slides, and demos in a single zip package)

 

 

Academic calendar: https://case.edu/registrar/dates-deadlines/academic-calendar

Final exam schedule: https://case.edu/registrar/dates-deadlines/final-exam-schedule  

NOTE: Presentation dates and deadlines are tentative. Exact dates will be announced in class!!!


 

Scoring and Grading

40% - 4 Homeworks (10 points each)

15% - Survey

o   A survey on papers for the selected research topics in recent database/AI conferences/journals

10% - Paper Presentation [10-15 min]

o   Presentation of one (1) AI and one (1) DB terminologies/problems/approaches (2%)

o   Presentation of an AI4DB research paper (8%)

30% - Research Project Presentation, Demonstration, and Project Report

o   Presentation and demonstration for the proposed research project (10%) [15-20 min]

o   Research project report (including introduction, related works, problem definition, solutions, experiments, and conclusions) (20%)

5% - Rating by other team members

5%   - Bonus presentation of any AI4DB paper [10-15 min]

 

A = 90 or higher

B = 80 - 89

C = 70 - 79

D = 60 - 69

F = <60

 

For homework assignments, please write down the intermediate steps of your answers. Partial marks will be given for your intermediate steps, even if the final answers are not correct.

 


 

Guidelines for Surveys/Papers/Projects

 

All surveys/papers/projects will be submitted electronically only. Instructions are given separately.

 

     Assignments must be submitted to Canvas by the due date.

     A survey or paper report turned in within two weeks after the due date will be considered late and will lose 30% of its grade (10% for the first week, and 20% more for the second week).

     No assignment will be accepted for grading after two weeks late.

     The late submission needs prior consent of the instructor.


 

Lecture Attendance Policy

Attendance in the lecture is mandatory. Students are expected to attend lectures, study the text, and contribute to discussions. You need to write your name on attendance sheets throughout the course, so please attend every lecture.

Students are expected to attend all scheduled classes and may be dropped from the course for excessive absences. Legitimate reasons for an "excused" absence include, but are not limited to, illness and injury, disability-related concerns, military service, death in the immediate family, religious observance, academic field trips, and participation in an approved concert or athletic event, and direct participation in university disciplinary hearings.

Even though any absence can potentially interfere with the planned development of a course, and the student bears the responsibility for fulfilling all course requirements in a timely and responsible manner, instructors will, without prejudice, provide students returning to class after a legitimate absence with appropriate assistance and counsel about completing missed assignments and class material. Neither academic departments nor individual faculty members are required to waive essential or fundamental academic requirements of a course to accommodate student absences. However, each circumstance will be reviewed on a case-by-case basis.


 

Make-up Presentation Policy

No make-up presentation will be given except for university sanctioned excused absences. Feel free to contact me (xxl1584@case.edu) before the presentation, or soon after the presentation as possible.


 

Academic Dishonesty Policy

The University expects a student to maintain a high standard of individual honor in his/her scholastic work. Unless otherwise required, each student is expected to complete his or her assignment individually and independently (even in the team, workload should be distributed to team members to accomplish individually). Although it is encouraged to study together, the work handed in for grading by each student is expected to be his or her own. Any form of academic dishonesty will be strictly forbidden and will be punished to the maximum extent. Copying an assignment from another student (team) in this class or obtaining a solution from some other source will lead to an automatic failure for this course and to a disciplinary action. Allowing another student to copy one's work will be treated as an act of academic dishonesty, leading to the same penalty as copying.

 

For more details, please refer to the graduate studies academic honesty policy: https://case.edu/gradstudies/sites/default/files/2018-04/SGS-Academic-Integrity-Policies-and-Rules.pdf or the Policies & Procedures for graduate studies: https://case.edu/gradstudies/current-students/policies-procedures.


 

Disclaimer

The instructor reserves the right to alter this syllabus as necessary.