Learning to Match Distributions: Optimal Transport, Flow Matching & Applications
CSCI-GA 3033-148 · New York University · Fall 2026
In a growing range of problems across statistics and machine learning, the object of interest is not a single data point but an entire distribution. A patient is summarized by the cloud of their single cells; an experimental condition by the population of measurements it produces; a generative model by its output distribution. Once distributions become the primary entity, a few basic questions come to mind: How far apart are two distributions? How do we transform one into another? What lies on the path between them?
This class develops the mathematical machinery for exactly these three operations — measuring distances between distributions, learning maps that optimally move one onto another, and constructing flows that interpolate between them — alongside the statistical machine learning approaches (MMD, optimal transport, flow matching) that make them practical at scale. It also aims to arm students with the essential practical skills required to do further research in this rapidly evolving field.
Instructor
Prerequisites
Students should have a graduate-level machine learning background plus solid probability and statistics (at the level of Fernández-Granda’s Probability and Statistics for Data Science, 2024). Comfort with Python and basic deep-learning tooling is expected for the project and demos.
Logistics
Time: Mondays, 4:55–6:55 PM ET Location: 60 Fifth Avenue, Room C10
Format: The course will adopt a hybrid format. Initially, the instructor will provide lectures to offer a broad overview and context. Following this, the class will seamlessly shift to student-led presentations and panel discussions, utilizing Alec Jacobson and Colin Raffel’s role-play seminar approach.
Communication: We will use Discord to facilitate discussion. The instructor will provide the link during the first class.
Course Schedule
The Calendar will be regularly updated with the full week-by-week schedule, readings and topics.
Welcome Homework
A short, ungraded warm-up assignment will be released during the first class. It gets everyone on the same tooling and refreshes the prerequisite background, so we can surface any gaps early. Difficulty with this assignment is a strong signal that the technical content of the course will be challenging.
Details and release date: TBD — to be announced at the first class.
Grading
Grading will be based on:
- Semester-long project (70%): an application/demo or a research project, in teams of 1–4. See the Project Logistics page for options, milestones, and deadlines.
- Paper presentation & panel participation (30%): Each student signs up to present papers and take on reading-group roles; graded on presentations and live panel participation (no written reviews). See the Role-play Seminar page to learn more about the format.
Use of AI
You are encouraged to use AI assistants (LLMs, coding copilots) as tools throughout the course — for coding, brainstorming, and learning. Two conditions:
- You are fully responsible for everything you submit or present: correctness, claims, and citations are on you.
- Briefly disclose how you used AI in each deliverable.
AI is no substitute for genuine understanding — in panels and discussions you must be able to defend your work without it.