The Gradient
The Gradient: Perspectives on AI
Jonathan Frankle: From Lottery Tickets to LLMs

Jonathan Frankle: From Lottery Tickets to LLMs

On sparsity in neural networks, what matters in research, and what AI practitioners and society should be thinking about.

In episode 96 of The Gradient Podcast, Daniel Bashir speaks to Jonathan Frankle.

Jonathan is the Chief Scientist at MosaicML and (as of release). Jonathan completed his PhD at MIT, where he investigated the properties of sparse neural networks that allow them to train effectively through his lottery ticket hypothesis. He also spends a portion of his time working on technology policy, and currently works with the OECD to implement the AI principles he helped develop in 2019.

Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at

Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSS
Follow The Gradient on Twitter


  • (00:00) Intro

  • (02:35) Jonathan’s background and work

    • (04:25) Origins of the Lottery Ticket Hypothesis

    • (06:00) Jonathan’s empiricism and approach to science

    • (08:25) More Karl Popper discourse + hot takes

  • (09:45) Walkthrough of the Lottery Ticket Hypothesis

    • (12:00) Issues with the Lottery Ticket Hypothesis as a statement

    • (12:30) Jonathan’s advice for PhD students, on asking good questions

    • (15:55) Strengths and Promise of the Lottery Ticket Hypothesis

  • (18:55) More Lottery Ticket Hypothesis Papers

    • (19:10) Comparing Rewinding and Fine-tuning

      • (23:00) Care in making experimental choices

    • (25:05) Linear Mode Connectivity and the Lottery Ticket Hypothesis

      • (27:50) On what is being measured and how

      • (28:50) “The outcome of optimization is determined to a linearly connected region”

      • (31:15) On good metrics

    • (32:54) On the Predictability of Pruning Across Scales — scaling laws for pruning

      • (34:40) The paper’s takeaway

    • (38:45) Pruning Neural Networks at Initialization — on a scientific disagreement

      • (45:00) On making takedown papers useful

      • (46:15) On what can be known early in training

  • (49:15) Jonathan’s perspective on important research questions today

  • (54:40) MosaicML

    • (55:19) How Mosaic got started

    • (56:17) Mosaic highlights

    • (57:33) Customer stories

  • (1:00:30) Jonathan’s work and perspectives on AI policy

    • (1:05:45) The key question: what we want

  • (1:07:35) Outro


The Gradient
The Gradient: Perspectives on AI
Deeply researched, technical interviews with experts thinking about AI and technology. Hosted, recorded, researched, and produced by Daniel Bashir.