The Gradient
The Gradient: Perspectives on AI
Suhail Doshi: The Future of Computer Vision

Suhail Doshi: The Future of Computer Vision

On (AI) art, why computer vision isn't getting enough attention (not that attention), building a powerful image editor, and open source.

Episode 123

I spoke with Suhail Doshi about:

  • Why benchmarks aren’t prepared for tomorrow’s AI models

  • How he thinks about artists in a world with advanced AI tools

  • Building a unified computer vision model that can generate, edit, and understand pixels.

Suhail is a software engineer and entrepreneur known for founding Mixpanel, Mighty Computing, and Playground AI (they’re hiring!).

Reach me at for feedback, ideas, guest suggestions.

Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSS
Follow The Gradient on Twitter


  • (00:00) Intro

  • (00:54) Ad read — MLOps conference

  • (01:30) Suhail is *not* in pivot hell but he *is* all-in on 50% AI-generated music

  • (03:45) AI and music, similarities to Playground

  • (07:50) Skill vs. creative capacity in art

  • (12:43) What we look for in music and art

  • (15:30) Enabling creative expression

  • (18:22) Building a unified computer vision model, underinvestment in computer vision

  • (23:14) Enhancing the aesthetic quality of images: color and contrast, benchmarks vs user desires

  • (29:05) “Benchmarks are not prepared for how powerful these models will become”

  • (31:56) Personalized models and personalized benchmarks

  • (36:39) Engaging users and benchmark development

  • (39:27) What a foundation model for graphics requires

  • (45:33) Text-to-image is insufficient

  • (46:38) DALL-E 2 and Imagen comparisons, FID

  • (49:40) Compositionality

  • (50:37) Why Playground focuses on images vs. 3d, video, etc.

  • (54:11) Open source and Playground’s strategy

  • (57:18) When to stop open-sourcing?

  • (1:03:38) Suhail’s thoughts on AGI discourse

  • (1:07:56) Outro


The Gradient
The Gradient: Perspectives on AI
Deeply researched, technical interviews with experts thinking about AI and technology. Hosted, recorded, researched, and produced by Daniel Bashir.