Research

This page is ~chronologically ordered. See my Google Scholar for a more updated view.

If you’re interested in collaborating, please email me at rylanschaeffer@gmail.com following my instructions. For those curious, I’ve posted a (work-in-progress) summary of my research approach.

Under Review

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? NeurIPS 2024 (Datasets & Benchmarks Track).

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data. Arxiv 2024.

Many-shot Jailbreaking. Arxiv 2024.

Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks. Under Review.

Bridging Associative Memory and Probabilistic Modeling. Under Review.

Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations. Under Review.

What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes. Under Review.

Accepted

Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle. ICLR 2024 (Blog Track).

Does Data Contamination Make a Difference? Insights from Intentionally Contaminating Pre-training Data for Language Models. Under Review.

Disentangling Fact from Grid Cell Fiction in Trained Deep Path Integrators. Biorxiv 2023.

Pretraining on the Test Set Is All You Need. Arxiv 2023.

Testing Assumptions Underlying a Unified Theory for the Origin of Grid Cells.

Are Emergent Abilities of Large Language Models a Mirage? NeurIPS 2023 (Outstanding Paper).

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. NeurIPS 2023 Benchmark Track.

Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells. NeurIPS 2023.

Divergence at the Interpolation Threshold: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle. NeurIPS 2023 Workshops: ATTRIB, Mathematics of Modern Machine Learning.

An Information-Theoretic Understanding of Maximum Manifold Capacity Representations. NeurIPS Workshops: UniReps (Oral), InfoCog (Spotlight), NeurReps, SSL.

Associative Memory Under the Probabilistic Lens: Improved Transformers & Dynamic Memory Creation. NeurIPS 2023 Workshop: Associative Memories & Hopfield Networks.

Testing Assumptions Underlying a Unified Theory for the Origin of Grid Cells. NeurIPS 2023 Workshops: UniReps, NeurReps, AI4Science.

Beyond Expectations: Model-Driven Amplification of Dataset Biases in Data Feedback Loops. NeurIPS 2023 Workshop: Algorithmic Fairness through the Lens of Time.

Emergence of Sparse Representations from Noise. ICML 2023.

Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting. ICML 2023 Workshop: Knowledge and Logical Reasoning in the Era of Data-driven Learning.

Deceptive Alignment Monitoring. ICML 2023 AdvML Workshop (Blue Sky Oral).

FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation. ICML 2023 AdvML Workshop.

No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit. NeurIPS 2022.

Streaming Inference for Infinite Non-Stationary Clustering. CoLLAs 2022.

Streaming Inference for Infinite Latent Feature Models. ICML 2022.

No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit. ICML 2022 Workshop: AI for Science.

Streaming Inference for Infinite Non-Stationary Clustering. ICLR 2022 Workshop: Agent Learning in Open Endedness.

An Algorithmic Theory of Metacognition in Minds and Machines. NeurIPS 2021 Workshop: Metacognition in the Age of AI.

Efficient Online Inference for Nonparametric Mixture Models. UAI 2021.

Neural population dynamics for hierarchical inference in mice performing the International Brain Lab task. Society for Neuroscience 2021.

Neural network model of amygdalar memory engram formation and function. COSYNE 2021.

Reverse-engineering recurrent neural network solutions to a hierarchical inference task for mice. NeurIPS 2020.

Under Review

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? NeurIPS 2024 (Datasets & Benchmarks Track).

Brain-wide population codes for hierarchical inference in mice. SfN 2024.

Brain-wide representations of prior information in mouse decision-making. bioRxiv 2023.

A Brain-Wide Map of Neural Activity during Complex Behaviour. bioRxiv 2023.

Class Projects

Towards Unifying Smooth Neural Codes with Adversarially Robust Representations. 2019.

Paper

One Day

Memory engrams perform nonparametric non-stationary latent state associative learning.

Recovering low dimensional, interpretable mechanistic models via Representations and Dynamics Distillation (RADD).

Rylan Schaeffer

Research

Under Review

Accepted

Under Review

Class Projects

One Day

Explanations of Others’ Research