Charilaos I. Kanatsoulis

Research Scientist · Stanford SNAP Group

I am a Research Scientist in the Department of Computer Science at Stanford, working with Prof. Jure Leskovec at the SNAP Group. I am also an instructor for CS224W (Machine Learning with Graphs) and CS246 (Mining Massive Datasets).

I build foundation models for structured data, including relational databases, graphs, single-cell transcriptomics, and protein sequences. I also develop principled pretraining and adaptation methods for large language models.

Charilaos Kanatsoulis

What's New

Recent Research Projects

Relational Transformer
Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
ICLR 2026 Oral · EurIPS AI for Tabular Data
R. Ranjan, V. Hudovernik, M. Znidar, C. I. Kanatsoulis, et al., J. Leskovec

Problem. Foundation models for tabular and relational data lag behind LLMs and vision FMs.

Contribution. A transformer architecture for forecasting on relational databases, enabling large-scale pretraining and zero-shot generalization across schemas.

Relational Graph Transformer
Relational Graph Transformer
ICLR 2026 Best Paper · KDD TGL Workshop 2025
V. P. Dwivedi, S. Jaladi, Y. Shen, F. López, C. I. Kanatsoulis, R. Puri, M. Fey, J. Leskovec

Problem. Graph transformers don't naturally handle multi-table relational data.

Contribution. The first graph transformer architecture with relational attention and PEARL positional encoding tailored to multi-table data.

RelGNN
RelGNN: Composite Message Passing for Relational Deep Learning
ICML 2025
T. Chen, C. I. Kanatsoulis, J. Leskovec

Problem. Standard GNNs underperform on relational deep learning tasks.

Contribution. A novel GNN architecture for predictive queries on relational databases achieving SOTA performance with up to 25% improvement over baselines.

RDL Survey
Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures
ACM KDD 2025
V. P. Dwivedi, C. I. Kanatsoulis, S. Huang, J. Leskovec

Contribution. The latest on Relational Deep Learning: challenges, foundations, and a vision toward foundation models for relational databases.

KGGen 1.1k★
KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
NeurIPS 2025 1.1k+ stars
B. Mo, K. Yu, J. Kazdan, et al., C. I. Kanatsoulis, S. Koyejo

Problem. Existing pipelines produce sparse, noisy KGs from text.

Contribution. A Python package (pip install kg-gen) producing high-fidelity KGs via LLM extraction + entity clustering. Released MINE, the first benchmark for KG extraction.

Impact. 1.1k+ GitHub stars; widely used for biomedical and scientific reasoning.

TeX-Graph
COVID-19
TeX-Graph: Coupled Tensor-Matrix Knowledge-Graph Embedding for COVID-19 Drug Repurposing
SIAM SDM 2021
C. I. Kanatsoulis, N. D. Sidiropoulos

Problem. KG embeddings for biomedical discovery require fusing structured and textual evidence.

Contribution. Coupled tensor-matrix factorization for biomedical knowledge graphs, applied to COVID-19 drug repurposing, an early AI-driven hypothesis-generation pipeline.

GREmLN
CZI
GREmLN: A Cellular Regulatory Network-Aware Transcriptomics Foundation Model
ICLR 2026 Workshop on ML for Genomics Explorations
M. Zhang, V. Swamy, R. Cassius, L. Dupire, C. I. Kanatsoulis, et al., A. Califano

Problem. Single-cell transcriptomics foundation models treat genes as independent tokens, ignoring regulatory structure.

Contribution. A multimodal graph transformer that injects gene-regulatory-network structure as positional information.

Key insight. Biological priors during pretraining capture long-range gene-token dependencies for cancer, Alzheimer's, and other downstream tasks.

LoRTA
LoRTA: Low Rank Tensor Adaptation of Large Language Models
Under review · Microsoft collaboration
I. Hounie, C. I. Kanatsoulis, A. Tandon, A. Ribeiro

Problem. LoRA's matrix factorization leaves parameter-efficiency on the table for multi-axis adaptation.

Contribution. Generalizes LoRA to tensor decomposition, reducing trainable parameters by up to two orders of magnitude at matched performance.

Applications. DPO, instruction tuning, vision, and protein folding fine-tuning.

PEARL
Learning Efficient Positional Encodings with Graph Neural Networks (PEARL)
ICLR 2025
C. I. Kanatsoulis, E. Choi, S. Jegelka, J. Leskovec, A. Ribeiro

Problem. Eigenvector-based positional encodings (e.g., Laplacian PEs) give strong inductive bias to graph transformers but require expensive eigendecomposition and suffer from instability and limited expressivity.

Contribution. PEARL is a learnable PE framework that approximates equivariant functions of eigenvectors with linear complexity, matching or surpassing full eigenvector PEs at one-to-two orders of magnitude lower cost.

Key insight. Message-passing GNNs initialized with random / basis-vector node features compute nonlinear maps of eigenvectors, unlocking expressive positional encodings without explicit eigendecomposition. A foundational primitive for our GFM work.

GNN ➜ WL
Graph Neural Networks Are More Powerful Than We Think
ICASSP 2024
C. I. Kanatsoulis, A. Ribeiro

Problem. Common belief: Weisfeiler-Lehman bounds GNN expressivity.

Contribution. GNNs can discriminate the majority of real graphs; WL is not the real limit. We design convolutional architectures that are provably more expressive than WL.

Counting
Substructures
Counting Graph Substructures with Graph Neural Networks
ICLR 2024
C. I. Kanatsoulis, A. Ribeiro

Problem. Counting substructures (triangles, cycles, paths, cliques) is fundamental to chemistry, biology, and graph reasoning, yet standard message-passing GNNs are bounded by 1-WL and provably cannot count most of them.

Contribution. A theoretical and architectural framework characterizing exactly when GNNs can count substructures, together with provably expressive architectures that go beyond 1-WL.

Key insight. Substructure counting is a much finer-grained yardstick for GNN expressivity than the 1-WL test, and reveals the latent power of message-passing models.

Teaching

CS224W
Stanford University · 2024 – present
with Jure Leskovec
CS246
Stanford University · 2024 – present
with Jure Leskovec
ESE5140
University of Pennsylvania · 2022 – 2023

See the full CV for workshop organization, invited talks, service, and complete awards list.