Biography

Every cell in our bodies tells a story — I develop the machine learning tools that help scientists read millions of such stories at once. I serve as an Assistant Professor of Computer Science and Biology at the Courant Institute of Mathematical Sciences and the Department of Biology at New York University, where I lead the Biological Machine Learning group. My research develops probabilistic machine learning methods to uncover the biological mechanisms that govern cellular behavior and disease.

From 2021 to 2025, I was a Postdoctoral Fellow at Genentech and Stanford Medicine, hosted by Jonathan Pritchard and Aviv Regev. I received my PhD in Computer Science from UC Berkeley with Mike Jordan & Nir Yosef, and my MSc from École polytechnique. I grew up in Bédarieux, France.

My research develops Deep Generative Models (DGMs) that can provide insights into complex biological systems. I pioneered probabilistic approaches for single-cell analysis with scVI and co-developed scvi-tools, now widely adopted tools for modeling cellular heterogeneity—foundations I cover in my MIT guest lecture on variational autoencoders for biology. More recently, we’ve extended these approaches to model complex system dynamics across time, using flow matching to predict how cells respond to perturbations over time. Beyond generative models, I develop causal machine learning methods, including DCD-FG that can discover causal relationships among thousands of variables, enabling large-scale causal discovery in high-dimensional biological data (Broad Institute talk).

More broadly, I’m interested in advancing machine learning for scientific discovery. While DGMs provide an appealing paradigm for learning from biological data, significant work remains to fully exploit them as part of scientific hypothesis testing pipelines—particularly around causality, interpretability, disentanglement, and decision-making.

These methodological advances enable us to tackle fundamental questions in cellular biology. We develop methods to predict how cells respond to perturbations, using optimal transport to match cellular states across different experimental conditions—work I presented at MLCB 2024. We also analyze spatial organization within tissues, developing DestVI to map continuous cell type variations in spatial transcriptomics data. In tumor microenvironments, this reveals how immunosuppressive macrophages localize to hypoxic regions near necrotic cores, providing insights into how spatial organization drives immune dysfunction in cancer.

I am excited about how machine learning can unlock entirely new ways of understanding biology—turning the overwhelming complexity of cellular data into discoveries that could transform medicine.

Interests

  • Machine Learning + Science
  • Computational Biology
  • Causal Inference
  • Deep Generative Models
  • Applied Statistics

Education

  • PhD in Electrical Engineering & Computer Sciences, 2021

    University of California, Berkeley

  • MSc in Applied Mathematics, 2016

    Ecole polytechnique, France