The Gradient
The Gradient: Perspectives on AI
Hattie Zhou: Lottery Tickets and Algorithmic Reasoning in LLMs

Hattie Zhou: Lottery Tickets and Algorithmic Reasoning in LLMs

On lottery tickets and forgetting in neural networks, endowing LLMs with algorithmic reasoning, and ML research culture.

In episode 60 of The Gradient Podcast, Daniel Bashir speaks to Hattie Zhou.

Hattie is a PhD student at the Université de Montréal and Mila. Her research focuses on understanding how and why neural networks work, based on the belief that the performance of modern neural networks exceeds our understanding and that building more capable and trustworthy models requires bridging this gap. Prior to Mila, she spent time as a data scientist at Uber and did research with Uber AI Labs.

Have suggestions for future podcast guests (or other feedback)? Let us know here!

Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSS
Follow The Gradient on Twitter


  • (00:00) Intro

  • (01:55) Hattie’s Origin Story, Uber AI Labs, empirical theory and other sorts of research

  • (10:00) Intro to the Lottery Ticket Hypothesis & Deconstructing Lottery Tickets

    • (14:30) Lottery tickets as lucky initialization

    • (17:00) Types of masking and the “masking is training” claim

    • (24:00) Type-0 masks and weight evolution over long training trajectories

    • (27:00) Can you identify good masks or training trajectories a priori?

    • (29:00) The role of signs in neural net initialization

    • (35:27) The Supermask

    • (41:00) Masks to probe pretrained models and model steerability

  • (47:40) Fortuitous Forgetting in Connectionist Networks

    • (54:00) Relationships to other work (double descent, grokking, etc.)

  • (1:01:00) The iterative training process in fortuitous forgetting, scale and value of exploring alternatives

  • (1:03:35) In-Context Learning and Teaching Algorithmic Reasoning

    • (1:09:00) Learning + algorithmic reasoning, prompting strategy

  • (1:13:50) What’s happening with in-context learning?

    • (1:14:00) Induction heads

    • (1:17:00) ICL and gradient descent

    • (1:22:00) Algorithmic prompting vs discovery

    • (1:24:45) Future directions for algorithmic prompting

  • (1:26:30) Interesting work from NeurIPS 2022

  • (1:28:20) Hattie’s perspective on scientific questions people pay attention to, underrated problems

  • (1:34:30) Hattie’s perspective on ML publishing culture

  • (1:42:12) Outro


The Gradient
The Gradient: Perspectives on AI
Deeply researched, technical interviews with experts thinking about AI and technology. Hosted, recorded, researched, and produced by Daniel Bashir.