Suhail Doshi: The Future of Computer Vision

The Gradient: Perspectives on AI

0:00

-1:08:07

On (AI) art, why computer vision isn't getting enough attention (not that attention), building a powerful image editor, and open source.

May 16, 2024

Episode 123

I spoke with Suhail Doshi about:

Why benchmarks aren’t prepared for tomorrow’s AI models
How he thinks about artists in a world with advanced AI tools
Building a unified computer vision model that can generate, edit, and understand pixels.

Suhail is a software engineer and entrepreneur known for founding Mixpanel, Mighty Computing, and Playground AI (they’re hiring!).

Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions.

Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSS
Follow The Gradient on Twitter

Outline:

(00:00) Intro
(00:54) Ad read — MLOps conference
(01:30) Suhail is *not* in pivot hell but he *is* all-in on 50% AI-generated music
(03:45) AI and music, similarities to Playground
(07:50) Skill vs. creative capacity in art
(12:43) What we look for in music and art
(15:30) Enabling creative expression
(18:22) Building a unified computer vision model, underinvestment in computer vision
(23:14) Enhancing the aesthetic quality of images: color and contrast, benchmarks vs user desires
(29:05) “Benchmarks are not prepared for how powerful these models will become”
(31:56) Personalized models and personalized benchmarks
(36:39) Engaging users and benchmark development
(39:27) What a foundation model for graphics requires
(45:33) Text-to-image is insufficient
(46:38) DALL-E 2 and Imagen comparisons, FID
(49:40) Compositionality
(50:37) Why Playground focuses on images vs. 3d, video, etc.
(54:11) Open source and Playground’s strategy
(57:18) When to stop open-sourcing?
(1:03:38) Suhail’s thoughts on AGI discourse
(1:07:56) Outro

Links: