501(c)(3) — Independent research laboratory EIN 87-3219584
arXiv ↗ GitHub ↗ Press ↗
Founded
June 2019 · Berkeley, CA
Vol. VII — November 2025
Issue 003 · Latest preprint published Nov 09
Citations
28,914 · h-index 31

Research on the foundations of machine learning003.

An independent laboratory studying interpretability, alignment, and the theoretical basis of large-scale neural network behavior. We publish openly, share code and weights, and turn down corporate funding that requires us to withhold either.

§ 01 — Research directions

Four questions we think need answering. By someone. Possibly us.

Reverson Labs runs four long-horizon research programs. We define a program as a question we expect to take at least three years and a small team to make meaningful progress on. We change them rarely.

§ 01 · 14 papers

Interpretability at scale

Mechanistic methods for understanding the computations performed by frontier transformer models. Circuit discovery, feature attribution, and the systematic mapping of model behavior to model internals.

14papers 6researchers
§ 02 · 9 papers

Alignment theory

Formal frameworks for specifying, monitoring, and steering the behavior of trained systems. Particular focus on settings where the human supervisory signal is incomplete, ambiguous, or systematically biased.

9papers 5researchers
§ 03 · 11 papers

Learning dynamics

The mathematical structure of large-scale training: feature emergence, grokking phenomena, scaling laws, and the implicit biases of stochastic gradient descent on overparametrized networks.

11papers 4researchers
§ 04 · 7 papers

Evaluation methods

Honest, hard-to-game benchmarks for high-stakes capabilities. Held-out test design, contamination detection, and the construction of evaluations that scale with model capability.

7papers 3researchers
§ 02 — Recent publications

A few papers we're proud of. With a few we're still mildly anxious about.

Selected recent publications, in reverse chronological order. Every paper here has been posted to arXiv, and we release code and trained weights for every empirical result. Our complete publication list lives in the library.

001

NEW Sparse Circuit Discovery at Scale: Tracing 12-Layer Decision Pathways in Transformer Models

K. Aoki, M. Quincy, R. Ohlsen, P. Reverson

PreprintarXiv:2511.04827
002

A Formal Framework for Outer Alignment Under Bounded Supervision

M. Quincy, T. Ife, P. Reverson

NeurIPS 2025Oral · Spotlight
003

Phase Transitions in Skill Acquisition: A Bayesian Account of Grokking

R. Ohlsen, S. Nadar, K. Aoki

ICLR 2025Oral
004

Contamination Audits for Long-Tail Evaluation: How Much Did the Model Already See?

T. Ife, M. Quincy, R. Ohlsen

ICML 2025Best paper · runner-up
005

Feature Universality in Frontier Transformers: Evidence from Cross-Model Activation Patching

K. Aoki, P. Reverson, plus contributors at Google DeepMind

NeurIPS 2024Oral
006

A Note on the Geometry of Gradient Flow Near Loss-Landscape Saddles

S. Nadar, R. Ohlsen

JMLR 2024Vol. 25 · 47 pp
007

Eliciting Latent Knowledge via Self-Consistent Probing of Hidden States

M. Quincy, P. Reverson

ICLR 2024Spotlight
§ 03 — Researchers

Fourteen people, mostly in one building, mostly working on hard problems.

Reverson Labs is permanently small. We hire only when a research program has a specific gap we can't fill internally, and we hire people who could otherwise get tenure-track offers — but who prefer to write papers, share code, and not teach undergraduate calculus.

— DirectorJoined 2019

Priya Reverson

PhD Stanford '14, postdoc at OpenAI & DeepMind. Founder of the lab. Research focus: interpretability and alignment theory.

profile · papers →
— SeniorJoined 2020

Marlowe Quincy

PhD MIT '17, alignment theorist. Co-author of seven NeurIPS spotlight papers since joining. Leads the alignment program.

profile · papers →
— SeniorJoined 2022

Kenji Aoki

PhD Tokyo '18, previously at Anthropic interpretability. Leads the interpretability program. First author on Sparse Circuit Discovery (Nov 2025).

profile · papers →
— SeniorJoined 2021

Rivka Ohlsen

PhD Carnegie Mellon '19, learning theorist. Leads work on training dynamics and the mathematical structure of generalization.

profile · papers →
See all 14 researchers →
§ 04 — Open positions

We're hiring for four roles. Applications are open, reviewed continuously.

All positions are full-time, based in Berkeley, and come with the standard package: competitive salary, full benefits, full compute allocation, full credit on published work. We sponsor visas. We hire on quality, not pedigree.

R · 01

Research Scientist · Interpretability

Program 01Mechanistic interpretability
PhD + 0—5 yrs
R · 02

Research Scientist · Alignment theory

Program 02Formal alignment
PhD + 2—10 yrs
E · 03

Research Engineer · Infrastructure

Cross-programTraining & eval tools
5+ yrs ML systems
V · 04

Visiting Scholar · Evaluation methods

Program 046—12 month residency
Faculty / postdoc
§ 05 — In the literature

When other people cite us. Or write about us.

"

The Reverson group's circuit-discovery framework has become the de-facto vocabulary for the field — a small lab punching far above its weight on mechanistic interpretability.

Nature Machine Intelligence Editorial, May 2025
cit. 287
"

Reverson Labs continues to publish some of the most theoretically rigorous alignment work in the field — and to release all of it openly, which is rarer than it should be.

The Economist AI special report, Sept 2024
cit. 142
28,914
Total citations
across all published work
31
h-index across the lab
as of November 2025
41
Peer-reviewed papers
since founding, 2019
100%
Empirical results
with open code & weights
— Funded by
Open Philanthropy The Sloan Foundation Schmidt Futures NSF · awards 2384812, 2390221 DARPA · agreement W912-25-C-0184
§ 06 — Get in touch

Read our papers. Read the code. Get in touch.

We publish everything openly. We're glad to talk with researchers, journalists, and policy people who want to dig into specific questions — or who think we might be wrong.