Catherine Olsson and Nelson Elhage: Anthropic, Understanding Transformers

The Gradient: Perspectives on AI

0:00

-47:04

Catherine Olsson and Nelson Elhage: Anthropic, Understanding Transformers

A conversation with Catherine Olsson and Nelson Elhage, technical members at Anthropic's interpretability team

Andrey Kurenkov

Aug 26, 2022

In episode 39 of The Gradient Podcast, Andrey Kurenkov speaks to Catherine Olsson and Nelson Elhage.

Catherine and Nelson are both members of technical staff at Anthropic, which is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems. Catherine and Nelson’s focus is on interpretability, and we will discuss several of their recent works in this interview.

Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSS
Follow The Gradient on Twitter

Outline:

(00:00) Intro
(01:10) Catherine’s Path into AI
(03:25) Nelson’s Path into AI
(05:23) Overview of Anthropic
(08:21) Mechanistic Interpretability
- Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases
(15:15) Transformer Circuits
- A Mathematical Framework for Transformer Circuits
(21:30) Toy Transformer
(27:25) Induction Heads
- In-context Learning and Induction Heads
(31:00) In-Context Learning
(35:10) Evidence for Induction Heads Enabling In-Context Learning
(39:30) What’s Next
(43:10) Replicating Results
- PySvelte
(46:00) Outro

Links:

Catherine Olsson and Nelson Elhage: Anthropic, Understanding Transformers

Discussion about this episode