The Gradient
The Gradient: Perspectives on AI
Davidad Dalrymple: Towards Provably Safe AI
0:00
Current time: 0:00 / Total time: -1:20:50
-1:20:50

Davidad Dalrymple: Towards Provably Safe AI

On provably safe and beneficial AI and the UK's new research funding agency.

Episode 137

I spoke with Davidad Dalrymple about:

  • His perspectives on AI risk

  • ARIA (the UK’s Advanced Research and Invention Agency) and its Safeguarded AI Programme

Enjoy—and let me know what you think!


Davidad is a Programme Director at ARIA. He was most recently a Research Fellow in technical AI safety at Oxford. He co-invented the top-40 cryptocurrency Filecoin, led an international neuroscience collaboration, and was a senior software engineer at Twitter and multiple startups.


Find me on Twitter for updates on new episodes, and reach me at editor@thegradient.pub for feedback, ideas, guest suggestions.

Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSS
Follow The Gradient on Twitter

Outline:

  • (00:00) Intro

  • (00:36) Calibration and optimism about breakthroughs

  • (03:35) Calibration and AGI timelines, effects of AGI on humanity

  • (07:10) Davidad’s thoughts on the Orthogonality Thesis

  • (10:30) Understanding how our current direction relates to AGI and breakthroughs

  • (13:33) What Davidad thinks is needed for AGI

    • (17:00) Extracting knowledge

    • (19:01) Cyber-physical systems and modeling frameworks

  • (20:00) Continuities between Davidad’s earlier work and ARIA

  • (22:56) Path dependence in technology, race dynamics

  • (26:40) More on Davidad’s perspective on what might go wrong with AGI

  • (28:57) Vulnerable world, interconnectedness of computers and control

  • (34:52) Formal verification and world modeling, Open Agency Architecture

    • (35:25) The Semantic Sufficiency Hypothesis

    • (39:31) Challenges for modeling

    • (43:44) The Deontic Sufficiency Hypothesis and mathematical formalization

    • (49:25) Oversimplification and quantitative knowledge

    • (53:42) Collective deliberation in expressing values for AI

  • (55:56) ARIA’s Safeguarded AI Programme

  • (59:40) Anthropic’s ASL levels

  • (1:03:12) Guaranteed Safe AI —

    • (1:03:38) AI risk and (in)accurate world models

    • (1:09:59) Levels of safety specifications for world models and verifiers — steps to achieve high safety

  • (1:12:00) Davidad’s portfolio research approach and funding at ARIA

  • (1:15:46) Earlier concerns about ARIA — Davidad’s perspective

  • (1:19:26) Where to find more information on ARIA and the Safeguarded AI Programme

  • (1:20:44) Outro

Links:

Discussion about this podcast

The Gradient
The Gradient: Perspectives on AI
Deeply researched, technical interviews with experts thinking about AI and technology.