Abstracts
Prof.
Prof. Ioana Manolescu
Ecole Polytechnique, France
Talk 1A & 1B: Using AI to get to Data and its Meaning
Abstract
A core goal of data management is to enhance the accessibility and ease of understanding of the data. Human activity naturally produces data that exhibits heterogeneous semantics, structure, or syntax. Heterogeneous data can be seen as stretching from text to data trees, graphs, tables, or multidimensional (aggregate) tables.
A particular class of users of complex, heterogeneous data are journalists. They can use it for investigations, for fact-checking, for documenting an article on a society issue such as adaptation to climate change, education, fiscal policies, etc. While members of the press self-select for curiosity, intuition, people skills, and story-telling skills, modern journalistic work requires them to work with digital data. This tutorial will cover problems we faced and methods we developed to build tools that help journalists make sense of heterogeneous data (ConnectionLens, Abstra, ConnectionStudio) and efficiently and effectively query multidimensional data (StatCheck). These methods were developed in interaction with journalists from Le Monde and RadioFrance, the leading French newspaper, respectively, public broadcast operator. The common thread is the combination of explicit (database style) algorithms and statistic (trained Language Models) techniques towards explainable, frugal, efficient, and effective data search and discovery methods.
Joint work with Angelos Anadiotis, Oana Balalau, Helena Galhardas, Antoine Gauquier, Madhulika Mohanty, Pierre Senellart and others.
Bio:
Ioana Manolescu is a senior researcher at Inria Saclay and a part-time professor at Ecole Polytechnique, France. She is the lead of the CEDAR INRIA team focusing on rich data analytics at cloud scale. She is also the president of BDA, the French national scientific association focused on data management. She has been the PVLDB Endowment Board of Trustees, and has been Associate Editor for PVLDB, president of the ACM SIGMOD PhD Award Committee, chair of the IEEE ICDE conference, and a program chair of EDBT, SSDBM, ICWE among others. A Senior ACM member since 2021, she is a recipient of the ACM SIGMOD 2020 Contribution Award.
Ioana has co-authored more than 150 articles in international journals and conferences and co-authored books on “Web Data Management” and on “Cloud-based RDF Data Management”. Her main research interests algebraic and storage optimizations for semistructured data, in particular Semantic Web graphs, novel data models and languages for complex data management, data models and algorithms for fact-checking and data journalism, a topic where she is collaborating with journalists from Le Monde. She is also a recipient of the ANR AI Chair titled “SourcesSay: Intelligent Analysis and Interconnexion of Heterogeneous Data in Digital Arenas” (2020-2024).
Prof. Dr. Volker Markl
Chair of the Database Systems and Information Management (DIMA) Group at TU Berlin
Director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD)
Chief Scientist and Head of the Intelligent Analytics for Massive Data Research Group at German Research Center for Artificial Intelligence (DFKI)
Talk 2A: NebulaStream – Data Stream Processing for the Edge-Cloud-Continuum
Abstract
Modern data-driven applications arising in such domains as smart manufacturing, healthcare, and the Internet of Things, pose new challenges to data processing systems. Traditional stream processing systems, such as Flink, Spark, and Kafka Streams are ill-suited to cope with the massive scale of distribution, the heterogeneous computing landscape, and requirements, such as timely processing and actuation. Classical approaches like managed runtimes, interpretation-based query processing, and the optimization of single queries that neglect interactions, greatly limit throughput, latency, energy-efficiency, and the general usability of these systems for emerging applications involving distributed data processing at scale in a sensor-edge-cloud-environment.
To overcome these limitations, we are researching and building NebulaStream, a novel open-soruce data stream processing system for massively distributed, heterogeneous environments. NebulaStream supports (potentially resource-constrained) heterogeneous devices, a hierarchical topology (with the distribution of computation and data flow in a cloud-edge-continuum), and the sharing of computations and data across multiple concurrent queries. This presentation discusses the design goals and core concepts of NebulaStream and looks back at inspirations drawn from our prior work on Stratosphere and Apache Flink, among others.
Bio:
Volker Markl is a German Professor of Computer Science. He leads the Chair of Database Systems and Information Management (DIMA) at TU Berlin and the Intelligent Analytics for Massive Data Research Department at the German Research Center for Artificial Intelligence (DFKI). In addition, he is Director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD). He is a database systems researcher, conducting research at the intersection of distributed systems, scalable data processing, and machine learning. Between 2010 – 2015, Volker led the DFG-funded Stratosphere project, which resulted in the creation of Apache Flink. He has received numerous honors and prestigious awards including two ACM SIGMOD Research Highlight Awards and best paper awards at leading conferences, such as ACM SIGMOD, VLDB, IEEE ICDE, and EDBT. In 2020, he was named an ACM Fellow for his contributions to query optimization, scalable data processing, and data programmability and earned the ACM SIGMOD Systems Award for Apache Flink in 2023. In 2014, he was elected one of Germany’s leading “Digital Minds“ (Digitale Köpfe) by the German Informatics Society. He also is a member of the Berlin-Brandenburg Academy of Sciences (BBAW) and serves as advisor to academic institutions, governmental organizations, and technology companies. Volker holds eighteen patents and has been co-founder and mentor to several startups.
Yannik Schröder
Database Systems and Information Management (DIMA) Group at TU Berlin
Berlin Institute for the Foundations of Learning and Data (BIFOLD)
Talk 2B: A Hands-On Tutorial on NebulaStream
Abstract:
NebulaStream is a novel, open-source data stream processing system for distributed, heterogeneous data streams in the cloud-edge continuum. It adheres to the design goals of ease-of-use, extensibility, and efficiency to provide a framework for users and developers to implement diverse Internet of Things (IoT) use cases. Equipped with essential built-in functionalities, NebulaStream allows users to customize the system easily while ensuring efficient execution even on low-end devices. In this tutorial, we showcase NebulaStream’s extensibility capabilities with a specific focus on integrating and processing multi-modal data. Visitors of the tutorial will learn how to extend NebulaStream, implementing functions, sources (data ingestion), sinks (data export) and data types operating on multi-modal data. After the tutioral, a visitor should be able to extend NebulaStream on their own, e.g., creating a new function, without the need to modify or even understand the rest of the codebase.
Bio: Yannik Schroeder is a Ph.D. student in his first year at BIFOLD/TU Berlin. Yannik holds a B.Sc. and M.Sc. in Computer Science/Data Engineering. His research focuses on declarative streaming time series analysis.
Prof. Sihem Amer-Yahia
CNRS, University of Grenoble Alpes, France
Talk 3A & 3B: AI Planning for Data Exploration
Abstract
Exploratory Data Analysis (EDA) is an online decision making process whereas the next best step is chosen based on the latest observed insights. AI Planning is a long standing sub-area of AI, dealing with sequential decision making. This course examines the applicability of two main AI Planning approaches to EDA: Reinforcement Learning (RL) and Agentic AI.
RL4EDA trains an agent to choose the best query, a.k.a. action, on data, and rewards the agent for found insights. The first part of this course will cover the use of various RL algorithms such as SARSA and Actor/Critic, to solve common EDA questions such as finding galaxies of interest or navigating customer reviews.
The second part of the course will introduce Agentic AI, a nascent area where a supervisor agent orchestrates multiple specialized ones to achieve a task. We will introduce a reference architecture for Agentic AI that includes task decomposition and agent discovery.
The parallel with RL immediately comes to mind and that is what we plan to explore in the hands-on part of this course where students will deploy both RL and Agentic AI and get to experience their potential and limitations for finding items of interest.
The last part of the course will dive into multiple open problems that arise when seeking to close the gap between RL4EDA and Agentic AI for EDA.
Special Instructions
Python:
- install anaconda:
- com
- Distribution Installers
- Choose Python 3.12
- 64-Bit (Apple silicon) Graphical Installer (704.7M)
- launch terminal in anaconda (open anaconda navigator → environments → base)
- install Python libraries:
pip install torch numpy pandas pfrl gym statistics langgraph langchain_ollama langchain_community langchain_core
Ollama:
- install ollama:
https://ollama.com/download/mac
- launch ollama: serve ollama
- download LLM: ollama pull llama3.1
- Ollama ls checks if your llama3.1 has been added
Tutorial if needed: https://github.com/ollama/ollama/blob/main/docs/linux.md
For LangSmith, create your own LangChain API key:
- Create an account on https://www.langchain.com
- Settings → API Keys → Create API Key
- The key must be copied into a file
Bio:
Sihem Amer-Yahia is a Silver Medal CNRS Research Director and Deputy Director of the Lab of Informatics of Grenoble. She works on exploratory data analysis and algorithmic upskilling. Prior to that she was Principal Scientist at QCRI, Senior Scientist at Yahoo! Research and Member of Technical Staff at at&t Labs. Sihem served as PC chair for SIGMOD 2023 and as the coordinator of the Diversity, Equity and Inclusion initiative for the database community. In 2024, she received the 2024 IEEE TCDE Impact Award, the SIGMOD Contributions Award, and the VLDB Women in Database Award.
Dr. Charalampos Tsourakakis
RelationalAI & Boston University, USA
Talk 4A: Algorithmic Techniques in Graph Analytics
Abstarct
This talk explores foundational algorithmic techniques in graph analytics, focusing on both classical and modern approaches. We begin with the deceptively simple triangle query, which opens the door to a rich array of algorithmic ideas—from the classic Chiba-Nishizeki algorithm and its use of graph arboricity, to analytical tools like the Loomis-Whitney inequality. We then explore modern advances in triangle sparsification, including our work on Doulion and colorful triangle counting, which highlight the deep interplay between sampling, structure, and scalability in graph analytics. We then turn to clustering, presenting the intuition behind spectral methods, extensions using motif-based Laplacians, and practical heuristics. Finally, we address settings where full clustering is not feasible, motivating the discovery of dense subgraphs. Throughout, we emphasize tools from graph theory, combinatorial optimization, and randomized algorithms that underpin scalable graph analytics.
Talk 4B: Machine Learning in Graph Analytics
Abstract
Node embeddings, which lie at the intersection of network analysis and machine learning, have transformed how we represent and analyze complex networks. These methods map nodes to low-dimensional vectors in continuous space using tools such as random walks and deep learning architectures.
In this talk, I will explore fundamental questions about the expressiveness and limitations of node embeddings. What structural information do methods like DeepWalk and node2vec capture? How does this relate to performance in downstream tasks? Can embeddings be inverted to recover the original graph, approximately or exactly? We address these questions and show that low-dimensional embeddings can provably capture key properties of real-world networks. We also examine edge-independent generative models—such as NetGAN—that rely on node proximity in embedding space to predict links. While popular, we prove intrinsic limitations of these models in generating triangle-rich graphs, and introduce a hierarchy of models with increasing expressiveness.
Bio:
Dr. Charalampos Tsourakakis received his Ph.D. from the Algorithms, Combinatorics and Optimization (ACO) program at Carnegie Mellon University, and served as a Postdoctoral Fellow in Harvard University. He holds a Diploma in Electrical and Diploma Engineering from the National Technical University of Athens and a Master of Science from the Machine Learning Department at Carnegie Mellon University. Before joining Boston University, he worked as a researcher in the Google Brain team.
He won a best paper award in IEEE Data Mining, has delivered three tutorials in the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, and has designed two graph mining libraries for large-scale graph mining, one of which has been officially included in Windows Azure. His research focuses on large-scale graph mining, and machine learning.
Prof. Flora Salim
University of New South Wales, Australia
Talk 5A & 5B: Foundational AI for Time Series and Multimodal Sensors
Abstract
This talk explores foundational AI approaches for time-series and multimodal sensor data, addressing the real-world challenges of missing values, heterogeneity, irregular sampling, noise, and limited labeled data. These issues are especially acute in dynamic environments such as transport and energy systems, where different sensors exhibit varying characteristics and data distributions.
To tackle these, we focus on recent advances in unsupervised and self-supervised learning, including contrastive learning, latent masking strategies, cross-modal alignment, long-tailed learning, continual learning, and Neural ODEs for modeling partially observed and streaming time-series.
We present state-of-the-art methods that enable learning from multimodal and time-series signals without relying on large annotated datasets, including techniques for detecting change points, aligning cross-sensor representations, and adapting to evolving data streams.
In traffic flow forecasting, a case of multivariate timeseries forecasting, we explore how dynamic sensor relationships, beyond spatial dependency can be modelled, and how to generalize to entirely new roads, and how large-scale pretrained models improve spatial generalization.
We also highlight the role of continual learning for handling distributional shifts and new tasks, and the effectiveness of ODE-based architectures in handling irregular, noisy, and streaming inputs.
Finally, we situate this discussion in broader trends outlined in our comprehensive survey on foundation models for spatio-temporal data science, which articulates how pretraining, cross-domain transfer, and unified architectures are reshaping the field.
Bio:
Flora Salim is a Professor in the School of Computer Science and Engineering (CSE), the inaugural Cisco Chair of Digital Transport & AI, University of New South Wales (UNSW) Sydney, and the Deputy Director (Engagement) of UNSW AI Institute. Her research is on machine learning for time-series and multimodal sensor data and on trustworthy AI. She has received several prestigious fellowships including Humboldt-Bayer Fellowship, Humboldt Fellowship, Victoria Fellowship, and ARC Australian Postdoctoral (Industry) Fellowship.
She has attracted more than $20m in research and industry funding in the last 10 years, as lead or sole CI for more than half of these grants, including research funded by ARC, Microsoft Research US, Northrop Grumman Corporation US, Qatar National Priorities Research Program, Cisco, IBM Research, several city councils and many other industry and government partners/funders. She is a Chief Investigator on the ARC Centre of Excellence for Automated Decision Making and Society (ADM+S), co-leading the Machines Program and the Mobilities Focus Area. She was the recipient of the Women in AI Awards 2022 Australia and New Zealand in the Defence and Intelligence Category.
She is a member of the Australian Research Council (ARC) College of Experts. She serves as an Editor of IMWUT, Associate-Editor-in-Chief of IEEE Pervasive Computing, Associate Editor of ACM Transactions on Spatial Algorithms and Systems, a Steering Committee member of ACM UbiComp. She has served as a Senior Area Chair / Area Chair of AAAI, WWW, NeurIPS, and many other top-tier conferences in AI and ubiquitous computing.
She is an Associate of ELLIS Alicante and holds an Honorary Professor appointment at RMIT University. She was a Visiting Professor at University of Kassel, Germany, and University of Cambridge, England, in 2019.
Group website: https://cruiseresearchgroup.github.io; Personal website: florasalim.com
Prof. Constantine Dovrolis
Georgia Tech, USA & Cyprus Institute, Cyprus
Talk 6A: Neuro-inspired AI for efficient learning
Abstract
There is a growing overlap between Machine Learning, Neuroscience, and Network Theory. These three disciplines create a fertile inter-disciplinary cycle: a) inspiration from neuroscience leads to novel machine learning models and deep neural networks in particular, b) these networks can be better understood and designed using network theory, and c) machine learning and network theory provide new modeling tools to understand the brain’s structure and function, closing the cycle. In this talk, we will “tour” this cross-disciplinary research agenda by focusing on three recent works: a) the design of sparse neural networks that can learn fast and generalize well, b) the use of structural adaptation (plasticity) for continual learning, and c) online data selection for efficient training and fine-tuning.
Talk 6B: The new mathematics of deep learning
Abstract
Over the last decade, deep learning has evolved from being an enigmatic “black box” to a field where mathematics provide clear insights into its remarkable success. In this talk, we will explore how modern analysis has shed light on key questions, including:
1. Why overparameterized neural networks generalize well (despite earlier results from classical learning theory).
2. The critical role of depth in the neural network architecture.
3. How deep learning avoids the curse of dimensionality.
4. The surprising efficiency of optimization methods despite the non-convex nature of the problem.
Bio:
Prof. Constantine Dovrolis is the Director of the center for Computational Science and Technology (CaSToRC) at The Cyprus Institute (CyI) as of 1/1/2023. He is also a Professor at the School of Computer Science at the Georgia Institute of Technology (Georgia Tech). He is a graduate of the Technical University of Crete (Engr.Dipl. 1995), University of Rochester (M.S. 1996), and University of Wisconsin-Madison (Ph.D. 2000).
His research is highly inter-disciplinary, combining Network Theory, Data Mining and Machine Learning. Together with his collaborators and students, they have published in a wide range of scientific disciplines, including climate science, biology, and neuroscience. More recently, his group has been focusing on neuro-inspired architectures for machine learning based on what is currently known about the structure and function of brain networks.
According to Google Scholar, his publications have received more than 15,000 citations with an h-index of 56. His research has been sponsored by US agencies such as NSF, NIH, DOE, DARPA, and by companies such as Google, Microsoft and Cisco. He has published at diverse peer-reviewed conference and journals such as the International Conference on Machine Learning (ICML), the ACM SIGKDD conference, PLOS Computational Biology, Network Neuroscience, Climate Dynamics, the Journal of Computational Social Networks, and others.
Prof. Mohamed F. Mokbel
University of Minnesota, USA
Talk 7A: Machine Learning for Big Spatial Data and Applications
Abstract
This talk will focus on our efforts in adopting machine learning (ML) techniques for big spatial data and applications. This includes going for two orthogonal, but related, directions. In the first direction, we show that traditional ML-based applications like knowledge-base construction and data cleaning are missing a great opportunity by not incorporating the distinguishing characteristics of spatial data in their core operations. We then show that injecting spatial-awareness into the core ML operations behind these applications significantly boost their accuracy. In the second direction, we show that traditional spatial applications can benefit from the recent advances in ML techniques to significantly boost their scalability and accuracy. We will focus on two main widely used spatial applications, namely, map services (e.g., shortest path queries), and spatial data analysis.
Talk 7B: Large Language Models for Spatio-temporal Queries
Abstract
This talk gives a comprehensive overview of the research landscape of employing Large Language Models (LLMs) for spatio-temporal queries. The research landscape is categorized based on how LLMs are employed to serve various queries. This goes from employing LLMs as a black box with a bit of prompt engineering, to fine-tuning LLMs to fit spatio-temporal queries, to completely retrain a vanilla LLM architecture with spatio-temporal data, to modifying the internal LLM loss function to fit spatio-temporal applications. The seminar concludes by presenting a set of benchmarking and evaluation work while pointing out to research gaps, open problems, and future research directions for employing LLMs to spatio-temporal applications.
Bio:
Mohamed F. Mokbel received BSc and MSc degrees in Computer Science from Alexandria University, Egypt, in 1996 and 1999, respectively, and PhD degree in Computer Science from Purdue University in 2005. He is a Distinguished McKnight University Professor at the University of Minnesota. His research interests include database systems, spatial data, and GIS. His research work has been recognized by the ACM SIGSPATIAL 10-Year Impact Award and VLDB 10-years Best Paper Award. He is the Editor-in-Chief for ACM Transactions on Spatial algorithms and Systems (ACM TSAS). Mohamed is an ACM Distinguished Scientist and IEEE Fellow.
Prof. Li Xiong
Emory University, USA
Talk 8A & 8B: Data Privacy in the Age of AI and Large Language Models
Abstract
As artificial intelligence (AI) and large language models (LLMs) increasingly influence every facet of our lives, ensuring the privacy of user data has become paramount. In this talk, I will review the common privacy attacks for inferring training data from a trained AI model and common defenses for building privacy-enhanced models using privacy sensitive data. I will then present our recent works and discuss open challenges related to: 1) new privacy attacks under the fine-tuning paradigm using pre-trained LLMs, and 2) end-to-end privacy defenses across the life cycle of AI and LLMs.
Bio:
Li Xiong is a Samuel Candler Professor of Computer Science and Biomedical Informatics at Emory University. She has a Ph.D. from Georgia Institute of Technology, an MS from Johns Hopkins University, and a BS from the University of Science and Technology of China. Her research lab, Assured Information Management and Sharing (AIMS), conducts research on trustworthy and privacy-enhancing data-driven AI solutions for healthcare, public health, and spatial intelligence. She is recognized as an IEEE fellow (2022) and AAAS fellow (2024) for her contributions on privacy-preserving and secure data sharing and analytics. She has published over 200 papers and received seven best paper or runner up awards. Her research has been supported by both governments (NSF, NIH, IARPA, AFOSR) and industry/foundations (Mistubishi, Cisco, AT&T, Google, IBM). More details are at http://www. cs.emory.edu/~lxiong.
Prof. Cyrus Shahabi
University of Southern California, USA
Title: Releasing Mobility Data Safely
Lecture 9a: Privacy Models and Neural Aggregation
Lecture 9b: From Synthetic Mobility to Geo-Scale Foundation Models
Abstract
This two-part lecture series explores both foundational concepts and state-of-the-art techniques for enabling privacy-preserving access to and utility of mobility data. We begin by motivating the importance of mobility data across domains such as urban planning, transportation, public health, and security, while highlighting the privacy risks that constrain its use.
Lecture 1 focuses on protecting individual and aggregate location information, introducing key privacy models, including k-anonymity, geo-indistinguishability, and differential privacy, and presenting our deep learning–based approach for the safe release of aggregated mobility statistics.
Lecture 2 turns to synthetic trajectory generation, addressing why and how to generate realistic sequences of location visits, including arrival times and durations, when raw data cannot be shared. We present a transformer-based generative model and explore its potential as a core component in geo-scale foundation models.
Throughout both sessions, we bridge theory and practice, and conclude with open research challenges at the intersection of location privacy, spatiotemporal querying and modeling, and generative AI. Attendees will gain a strong foundation in location privacy concepts, practical understanding of modern methods for aggregate data release and synthetic trajectory generation, and a clear view of emerging research challenges in the field.
Bio:
Cyrus Shahabi is a Professor of Computer Science, Electrical & Computer Engineering and Spatial Sciences; Helen N. and Emmett H. Jones Professor of Engineering; and the director of the Integrated Media Systems Center (IMSC) at USC’s Viterbi School of Engineering. He also served as USC’s Thomas Lord Department of Computer Science from 2017 to 2022. He was co-founder of two startups, Geosemble Technologies and TallyGo, which both were acquired in July 2012 and March 2019, respectively. He received his B.S. in Computer Engineering from Sharif University of Technology in 1989 and then his M.S. and Ph.D. Degrees in Computer Science from the University of Southern California. He authored two books and more than three hundred research papers in databases, GIS, and multimedia, and he has over 14 US patents.
Dr. Shahabi has received funding from several agencies such as NSF, NIJ, NASA, NIH, DARPA, AFRL, IARPA, NGA, and DHS, as well as several industries such as Chevron, Cisco, Google, HP, Intel, Microsoft, NCR, NGC, and Oracle. He chaired the founding nomination committee of ACM SIGSPATIAL (2008-2011 term) and served as the chair of ACM SIGSPATIAL for the 2017-2020 term. He was an Associate Editor of IEEE Transactions on Parallel and Distributed Systems (TPDS) from 2004 to 2009, IEEE Transactions on Knowledge and Data Engineering (TKDE) from 2010 to 2013, VLDB Journal from 2009 to 2015 and PVLDB (Vol. 16) in 2023. He is on the ACM Transactions on Spatial Algorithms and Systems (TSAS) editorial board and ACM Computers in Entertainment. He was the founding chair of the IEEE NetDB workshop and the general co-chair of SSTD’15, ACM GIS 2007, 2008, and 2009. He has been PC co-chair of several conferences, such as APWeb+WAIM’2017, BigComp’2016, MDM’2016, DASFAA 2015, IEEE MDM 2013, IEEE BigData 2013 and VLDB 2024. He regularly serves on the program committee of major conferences such as VLDB, SIGMOD, IEEE ICDE, ACM SIGKDD, and IEEE ICDM.
Dr. Shahabi is a fellow of IEEE and NAI (National Academy of Inventors). He received the ACM Distinguished Scientist Award 2009, the 2003 U.S. Presidential Early Career Awards for Scientists and Engineers (PECASE), the NSF CAREER award in 2002, and the 2001 Okawa Foundation Research Award. He received the ACM SIGSPATIAL 2023 10-Year Impact Award in 2023. He was also a recipient of the US Vietnam Education Foundation (VEF) faculty fellowship award in 2011 and 2012, an organizer of the 2011 National Academy of Engineering “Japan-America Frontiers of Engineering” program, an invited speaker in the 2010 National Research Council (of the National Academies) Committee on New Research Directions for the National Geospatial-Intelligence Agency, and a participant in the 2005 National Academy of Engineering “Frontiers of Engineering” program.