<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Gradient: The Update]]></title><description><![CDATA[Biweekly updates covering recent AI news and research]]></description><link>https://thegradientpub.substack.com/s/the-update</link><image><url>https://substackcdn.com/image/fetch/$s_!qOyT!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F33e22926-7401-4e09-8c7c-1e6b0f179f76_1196x1196.png</url><title>The Gradient: The Update</title><link>https://thegradientpub.substack.com/s/the-update</link></image><generator>Substack</generator><lastBuildDate>Sat, 09 May 2026 09:51:58 GMT</lastBuildDate><atom:link href="https://thegradientpub.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[The Gradient]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[gradientpub@gmail.com]]></webMaster><itunes:owner><itunes:email><![CDATA[gradientpub@gmail.com]]></itunes:email><itunes:name><![CDATA[The Gradient]]></itunes:name></itunes:owner><itunes:author><![CDATA[The Gradient]]></itunes:author><googleplay:owner><![CDATA[gradientpub@gmail.com]]></googleplay:owner><googleplay:email><![CDATA[gradientpub@gmail.com]]></googleplay:email><googleplay:author><![CDATA[The Gradient]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Update #83: AI Music Fraud and PlanSearch]]></title><description><![CDATA[We look at the mechanics of AI-assisted music streaming fraud; researchers develop a new algorithm for LLM code generation that leverages inference time search over high-level plans]]></description><link>https://thegradientpub.substack.com/p/update-83-ai-music-fraud-and-plansearch</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/update-83-ai-music-fraud-and-plansearch</guid><dc:creator><![CDATA[Cole Frank]]></dc:creator><pubDate>Wed, 11 Sep 2024 13:55:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-QCR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 83rd update from the Gradient! If you&#8217;re new and like what you see, <a href="https://thegradientpub.substack.com/">subscribe</a> and follow us on <a href="https://twitter.com/gradientpub">Twitter</a>. <strong>Our newsletters run long, so you&#8217;ll need to view this post on Substack to see everything!</strong></p><p>Recently there has been a lot of  discourse about whether AI art is real art. While we did not have the nerve nor subject matter expertise to resolve this question on this week&#8217;s update, the following pieces are worth checking out:</p><ul><li><p>Ted Chiang&#8217;s New Yorker essay &#8220;<a href="https://www.newyorker.com/culture/the-weekend-essay/why-ai-isnt-going-to-make-art">Why A.I. Isn&#8217;t Going to Make Art</a>&#8221;, which kicked off this round of discourse</p></li><li><p>The Read Max <a href="https://maxread.substack.com/p/the-war-over-ai-writing">dispatch</a> responding to Chiang&#8217;s essay and the National Novel Writing Month (NaNoWriMo) LLM controversy</p></li><li><p>Celine Nguyen&#8217;s wonderful <a href="https://www.personalcanon.com/p/good-artists-copy-ai-artists-____">exploration</a> of where AI art stands in relation to the rest of art</p></li></ul><p>As always, if you want to write with us, send a pitch using <a href="https://goo.gl/forms/whYRKEzMZJox6FaH2">this form</a>.</p><h2><strong>News Highlight</strong>: <a href="https://www.nytimes.com/2024/09/05/nyregion/nc-man-charged-ai-fake-music.html">First criminal charges for AI-abetted music streaming fraud</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-QCR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-QCR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-QCR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-QCR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-QCR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-QCR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg" width="1456" height="898" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:898,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-QCR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-QCR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-QCR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-QCR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe750e887-97f2-4cdd-b470-838058f8ab70_1600x987.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Summary&nbsp;</h4><p>Federal prosecutors <a href="https://www.justice.gov/usao-sdny/pr/north-carolina-musician-charged-music-streaming-fraud-aided-artificial-intelligence">unveiled</a> the first ever criminal charges for a scheme involving &#8220;Artificially Inflated Music Streaming&#8221;. The indictment alleges that Michael Smith, a musician living in North Carolina, purchased AI-generated tracks, uploaded them to various streaming platforms, then used thousands of &#8220;bots&#8221; to repeatedly stream the tracks. The scheme allegedly netted him more than $10 million in royalty payments over the course of seven years. He is charged with wire fraud, wire fraud conspiracy, and money laundering conspiracy. Each charge carries a maximum sentence of 20 years in prison.</p><h4>Overview</h4><p>The scheme is pretty simple:&nbsp;</p><ol><li><p>Purchase thousands of fake email addresses</p></li><li><p>Use these email addresses to create and register thousands of fake accounts on music platforms like Spotify, Apple Music, and Youtube Music.</p></li><li><p>The royalty rate per stream is higher for paid accounts. So, find a &#8220;Manhattan-based service&#8221; whose business is to &#8220;provide large numbers of debit cards, typically corporate debit cards for employees of a company&#8221; to make it appear like each fake account used a different source of payment. Setting up thousands of paid accounts costs money, but it will be worth it.</p></li><li><p>Start streaming music you own a lot. More specifically:</p></li></ol><blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BcY2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7130f544-63c7-4ab4-a675-3a03efc63819_1294x588.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BcY2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7130f544-63c7-4ab4-a675-3a03efc63819_1294x588.png 424w, https://substackcdn.com/image/fetch/$s_!BcY2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7130f544-63c7-4ab4-a675-3a03efc63819_1294x588.png 848w, https://substackcdn.com/image/fetch/$s_!BcY2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7130f544-63c7-4ab4-a675-3a03efc63819_1294x588.png 1272w, https://substackcdn.com/image/fetch/$s_!BcY2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7130f544-63c7-4ab4-a675-3a03efc63819_1294x588.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BcY2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7130f544-63c7-4ab4-a675-3a03efc63819_1294x588.png" width="1294" height="588" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7130f544-63c7-4ab4-a675-3a03efc63819_1294x588.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:588,&quot;width&quot;:1294,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BcY2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7130f544-63c7-4ab4-a675-3a03efc63819_1294x588.png 424w, https://substackcdn.com/image/fetch/$s_!BcY2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7130f544-63c7-4ab4-a675-3a03efc63819_1294x588.png 848w, https://substackcdn.com/image/fetch/$s_!BcY2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7130f544-63c7-4ab4-a675-3a03efc63819_1294x588.png 1272w, https://substackcdn.com/image/fetch/$s_!BcY2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7130f544-63c7-4ab4-a675-3a03efc63819_1294x588.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><ol start="5"><li><p>Profit</p></li></ol><p>In the beginning there was no AI. Instead Smith unleashed his streaming bots on a music publicist&#8217;s large catalog of existing music. Later he offered his streaming army as a service to other musicians to boost their streams. But you can only stream a single song so many times before someone notices. To evade detection, Smith needed a bigger catalog, more material. So in 2018 he partnered with an as of yet unnamed &#8220;AI music company&#8221; to provide him with thousands of songs each week to upload to the platforms and manipulate the streams of.</p><p>The unnamed music company would provide Smith with the tracks and he&#8217;d generate random but oddly plausible track titles and artists:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NlG7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa9b35a-a1b3-4fa1-a0a1-18572a72dc2c_1294x522.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NlG7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa9b35a-a1b3-4fa1-a0a1-18572a72dc2c_1294x522.png 424w, https://substackcdn.com/image/fetch/$s_!NlG7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa9b35a-a1b3-4fa1-a0a1-18572a72dc2c_1294x522.png 848w, https://substackcdn.com/image/fetch/$s_!NlG7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa9b35a-a1b3-4fa1-a0a1-18572a72dc2c_1294x522.png 1272w, https://substackcdn.com/image/fetch/$s_!NlG7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa9b35a-a1b3-4fa1-a0a1-18572a72dc2c_1294x522.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NlG7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa9b35a-a1b3-4fa1-a0a1-18572a72dc2c_1294x522.png" width="1294" height="522" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2fa9b35a-a1b3-4fa1-a0a1-18572a72dc2c_1294x522.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:522,&quot;width&quot;:1294,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NlG7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa9b35a-a1b3-4fa1-a0a1-18572a72dc2c_1294x522.png 424w, https://substackcdn.com/image/fetch/$s_!NlG7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa9b35a-a1b3-4fa1-a0a1-18572a72dc2c_1294x522.png 848w, https://substackcdn.com/image/fetch/$s_!NlG7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa9b35a-a1b3-4fa1-a0a1-18572a72dc2c_1294x522.png 1272w, https://substackcdn.com/image/fetch/$s_!NlG7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa9b35a-a1b3-4fa1-a0a1-18572a72dc2c_1294x522.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Despite his best efforts, Smith had his account flagged or removed from platforms numerous times over the course of the scheme. Really he could have been a lot more subtle about the whole thing:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zeqQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc45224e1-134f-4a0b-8dbb-db137f171c2b_1292x486.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zeqQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc45224e1-134f-4a0b-8dbb-db137f171c2b_1292x486.png 424w, https://substackcdn.com/image/fetch/$s_!zeqQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc45224e1-134f-4a0b-8dbb-db137f171c2b_1292x486.png 848w, https://substackcdn.com/image/fetch/$s_!zeqQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc45224e1-134f-4a0b-8dbb-db137f171c2b_1292x486.png 1272w, https://substackcdn.com/image/fetch/$s_!zeqQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc45224e1-134f-4a0b-8dbb-db137f171c2b_1292x486.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zeqQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc45224e1-134f-4a0b-8dbb-db137f171c2b_1292x486.png" width="1292" height="486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c45224e1-134f-4a0b-8dbb-db137f171c2b_1292x486.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:1292,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zeqQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc45224e1-134f-4a0b-8dbb-db137f171c2b_1292x486.png 424w, https://substackcdn.com/image/fetch/$s_!zeqQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc45224e1-134f-4a0b-8dbb-db137f171c2b_1292x486.png 848w, https://substackcdn.com/image/fetch/$s_!zeqQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc45224e1-134f-4a0b-8dbb-db137f171c2b_1292x486.png 1272w, https://substackcdn.com/image/fetch/$s_!zeqQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc45224e1-134f-4a0b-8dbb-db137f171c2b_1292x486.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <a href="https://www.nytimes.com/2024/09/05/nyregion/nc-man-charged-ai-fake-music.html">indictment</a> does not provide details on how the AI-generated music was actually generated. The only indication that they were using bona fide deep learning genAI and not some sort of simpler procedural method is an email excerpt in the indictment from one of the AI music company employees to Smith: &#8220;Song quality is 10x-20x better now, and we also have vocal generation capabilities. . . . Have a listen to the attached for an idea of what I'm talking about." And in some sense AI is incidental to this story; there is nothing illegal about uploading AI-generated music to Spotify. AI just scaled the pre-existing fraud.</p><h4>Our Take</h4><p>The federal prosecutors are <a href="https://www.justice.gov/usao-sdny/pr/north-carolina-musician-charged-music-streaming-fraud-aided-artificial-intelligence">describing</a> Smith&#8217;s crime as stealing from &#8220;musicians, songwriters, and other rights holders whose songs were legitimately streamed.&#8221; Mechanically that&#8217;s true: Smith&#8217;s actions resulted in smaller royalty payments to other musicians. But I&#8217;m somewhat sympathetic to Mr. Smith (as are <a href="https://www.nytimes.com/2024/09/05/nyregion/nc-man-charged-ai-fake-music.html#commentsContainer">NYT commenters</a>) he took advantage of a broken system in <a href="https://www.wired.com/story/streaming-bots-spotify/#:~:text=Between%201%20billion%20and%203,by%20France's%20National%20Music%20Center.">a way that many others do</a>.</p><p>Matt Levine has a characteristically good and relevant <a href="https://www.bloomberg.com/opinion/articles/2024-09-09/fake-songs-made-real-money">take</a>:</p><p>&#8220;Basically much of modern economics, and life, has the following characteristics:</p><ul><li><p>Everything is intermediated through some depersonalized automated electronic exchange.</p></li><li><p>The automated electronic exchange has a mechanism &#8212; how it actually works, what the exchange&#8217;s software allows you to do &#8212; and also rules, the terms of service regulating how you can use the mechanism, which are fuzzier than the mechanism and written in small print, things like &#8220;don&#8217;t do fraud&#8221; or &#8220;you have to be a human&#8221; or whatever.</p></li><li><p>The mechanism is much more legible and salient than the rules, and in a depersonalized electronic world people treat the mechanism as the rules: They don&#8217;t believe that the rules exist, because the rules seem to contradict how the service works. The basic description of Spotify&#8217;s mechanics suggests Smith&#8217;s alleged arbitrage; if he didn&#8217;t do it surely someone else would.&#8221;</p></li></ul><p>&#8211; Cole</p><h2><strong>Research Highlight: </strong><a href="https://arxiv.org/abs/2409.03733">Planning In Natural Language Improves LLM Search For Code Generation</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hIyr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295a3def-e594-4eff-a3c3-2b6e7d2b46b0_1024x1024.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hIyr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295a3def-e594-4eff-a3c3-2b6e7d2b46b0_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hIyr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295a3def-e594-4eff-a3c3-2b6e7d2b46b0_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hIyr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295a3def-e594-4eff-a3c3-2b6e7d2b46b0_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hIyr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295a3def-e594-4eff-a3c3-2b6e7d2b46b0_1024x1024.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hIyr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295a3def-e594-4eff-a3c3-2b6e7d2b46b0_1024x1024.jpeg" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/295a3def-e594-4eff-a3c3-2b6e7d2b46b0_1024x1024.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:311356,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hIyr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295a3def-e594-4eff-a3c3-2b6e7d2b46b0_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hIyr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295a3def-e594-4eff-a3c3-2b6e7d2b46b0_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hIyr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295a3def-e594-4eff-a3c3-2b6e7d2b46b0_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hIyr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295a3def-e594-4eff-a3c3-2b6e7d2b46b0_1024x1024.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h4>Summary&nbsp;</h4><p>Researchers from Scale AI (including Gradient co-founder <a href="https://hughbzhang.com/">Hugh Zhang</a> &#128512; ) published PlanSearch, a novel search algorithm for LLM code generation tasks. Unlike traditional methods that scale inference compute by searching over similar code solutions, PlanSearch explores the space of problem-solving plans in natural language. This approach leads to a more diverse exploration of potential solutions. The algorithm shows promising results on several coding benchmarks, including HumanEval+, MBPP+, and LiveCodeBench. This work addresses the challenge of effectively scaling inference compute in LLMs for code generation and offers a new direction for search over the &#8220;concept&#8221; space rather than over code.</p><h4>Overview</h4><p>The authors observe that a lack of diversity in SoTA LLM outputs can hinder search algorithms (defined as any method that leverages additional compute at inference time to improve overall performance) because search benefits from exploring a diverse set of possibilities. A lack of diversity constrains search to a narrower set of possibilities. There is evidence that post-training methods like DPO and RLHF reduce output diversity and in fact the authors show that some models&#8217; base versions outperform their instruct versions when they are allowed to generate multiple possible solutions:<br></p><p>The key idea of PlanSearch is to search over higher-level, conceptual natural language descriptions of solutions rather than the solution code itself. The authors investigate this hypothesis first by exploring whether prompting an LLM with a correct natural language sketch of a solution improves code generation performance. They generate &#8220;backtranslated&#8221; sketches by feeding an LLM both a problem and a correct code solution and asking the LLM for a natural language description of the solution. They find the sketches significantly improve performance with longer sketches having even more benefit:</p><p>Next, they demonstrate the importance of having a good sketch and not just any sketch by showing that the accuracy of an LLM conditioned on a particular sketch trends towards either 0% or 100%:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QNd8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ff6786-672b-40ac-b5aa-1fa70f71926f_449x401.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QNd8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ff6786-672b-40ac-b5aa-1fa70f71926f_449x401.png 424w, https://substackcdn.com/image/fetch/$s_!QNd8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ff6786-672b-40ac-b5aa-1fa70f71926f_449x401.png 848w, https://substackcdn.com/image/fetch/$s_!QNd8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ff6786-672b-40ac-b5aa-1fa70f71926f_449x401.png 1272w, https://substackcdn.com/image/fetch/$s_!QNd8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ff6786-672b-40ac-b5aa-1fa70f71926f_449x401.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QNd8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ff6786-672b-40ac-b5aa-1fa70f71926f_449x401.png" width="449" height="401" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/39ff6786-672b-40ac-b5aa-1fa70f71926f_449x401.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:401,&quot;width&quot;:449,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QNd8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ff6786-672b-40ac-b5aa-1fa70f71926f_449x401.png 424w, https://substackcdn.com/image/fetch/$s_!QNd8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ff6786-672b-40ac-b5aa-1fa70f71926f_449x401.png 848w, https://substackcdn.com/image/fetch/$s_!QNd8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ff6786-672b-40ac-b5aa-1fa70f71926f_449x401.png 1272w, https://substackcdn.com/image/fetch/$s_!QNd8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ff6786-672b-40ac-b5aa-1fa70f71926f_449x401.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Having thus established that sketches improve performance and that indeed having a good sketch can even make or break performance, the authors present a search algorithm for capitalizing on the importance of sketches. For a given LLM and coding problem, their algorithm PlanSearch involves:</p><ol><li><p>Generating many first-order observations about the problem&nbsp;</p></li><li><p>Combinatorially sampling combinations of the first-order observations with which to generate second-order observations by prompting the LLM to use/merge the selected first-order observations</p></li><li><p>Generating a natural language description of a strategy (i.e. a sketch) to solve the problem based on the first and second order observations</p></li><li><p>Generating more solution sketches with the prompt &#8220;Your idea is wrong&#8221;</p></li><li><p>Generating a code solution based on the solution sketch</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iff9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ceb1f99-e21f-4cb9-b7a6-2ca9fa7740ed_1502x1082.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iff9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ceb1f99-e21f-4cb9-b7a6-2ca9fa7740ed_1502x1082.png 424w, https://substackcdn.com/image/fetch/$s_!iff9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ceb1f99-e21f-4cb9-b7a6-2ca9fa7740ed_1502x1082.png 848w, https://substackcdn.com/image/fetch/$s_!iff9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ceb1f99-e21f-4cb9-b7a6-2ca9fa7740ed_1502x1082.png 1272w, https://substackcdn.com/image/fetch/$s_!iff9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ceb1f99-e21f-4cb9-b7a6-2ca9fa7740ed_1502x1082.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iff9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ceb1f99-e21f-4cb9-b7a6-2ca9fa7740ed_1502x1082.png" width="1456" height="1049" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ceb1f99-e21f-4cb9-b7a6-2ca9fa7740ed_1502x1082.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1049,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iff9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ceb1f99-e21f-4cb9-b7a6-2ca9fa7740ed_1502x1082.png 424w, https://substackcdn.com/image/fetch/$s_!iff9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ceb1f99-e21f-4cb9-b7a6-2ca9fa7740ed_1502x1082.png 848w, https://substackcdn.com/image/fetch/$s_!iff9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ceb1f99-e21f-4cb9-b7a6-2ca9fa7740ed_1502x1082.png 1272w, https://substackcdn.com/image/fetch/$s_!iff9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ceb1f99-e21f-4cb9-b7a6-2ca9fa7740ed_1502x1082.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>PlanSearch is evaluated on three coding benchmarks (LiveCodeBench, HumanEval+, and MBPP+) on top of four models (GPT-4o and 4o-mini, DeepSeek-Coder-V2, and Claude-Sonnet-3.5). The authors compare PlanSearch with 200 generated solutions (&#8220;PlanSearch@200&#8221;) to basic repeated sampling 200 times (&#8220;Pass@200&#8221;), single generation with no search (&#8220;Pass@1&#8221;) and IdeaSearch&#8211;which simply asks for a sketch then separately prompts the LLM to generate code that follows the proposed sketch (&#8220;IdeaSearch@200&#8221;):</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3eyg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1fd798b-19e4-4dad-8a98-1f8bda68f215_915x522.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3eyg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1fd798b-19e4-4dad-8a98-1f8bda68f215_915x522.png 424w, https://substackcdn.com/image/fetch/$s_!3eyg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1fd798b-19e4-4dad-8a98-1f8bda68f215_915x522.png 848w, https://substackcdn.com/image/fetch/$s_!3eyg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1fd798b-19e4-4dad-8a98-1f8bda68f215_915x522.png 1272w, https://substackcdn.com/image/fetch/$s_!3eyg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1fd798b-19e4-4dad-8a98-1f8bda68f215_915x522.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3eyg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1fd798b-19e4-4dad-8a98-1f8bda68f215_915x522.png" width="915" height="522" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a1fd798b-19e4-4dad-8a98-1f8bda68f215_915x522.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:522,&quot;width&quot;:915,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3eyg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1fd798b-19e4-4dad-8a98-1f8bda68f215_915x522.png 424w, https://substackcdn.com/image/fetch/$s_!3eyg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1fd798b-19e4-4dad-8a98-1f8bda68f215_915x522.png 848w, https://substackcdn.com/image/fetch/$s_!3eyg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1fd798b-19e4-4dad-8a98-1f8bda68f215_915x522.png 1272w, https://substackcdn.com/image/fetch/$s_!3eyg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1fd798b-19e4-4dad-8a98-1f8bda68f215_915x522.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>PlanSearch performs extremely well, consistently outperforming the non-search baseline by 25-35 percentage points and &#8220;Pass@200&#8221; by 10-20 percentage points on LiveCodeBench.</p><h4>Our Take</h4><p>Exploring concept space rather than code solution space makes a lot of intuitive sense. 9 out of 10 computer science professors recommend sitting down with pen and paper to sketch out a plan before beginning to code. It doesn&#8217;t matter how good you are at writing code if your approach is wrong. Computer science legend Donald Knuth famously said &#8220;Premature optimization is the root of all evil.&#8221; Likewise just last week AI legend Noam Brown tweeted: &#8220;1 engineer working in the right direction beats 100 geniuses working in the wrong direction.&#8221; The principle is the same. The fact that this sort of bedrock comp sci wisdom ports to LLM performance is somehow comforting.</p><p>&#8211; Cole</p><h2>New from the Gradient</h2><h3><a href="https://thegradientpub.substack.com/p/whats-missing-from-llm-chatbots-a">What's Missing From LLM Chatbots: A Sense of Purpose</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Iyqc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Iyqc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Iyqc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp" width="1456" height="832" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Iyqc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong><a href="https://thegradientpub.substack.com/p/davidad-dalrymple-towards-provably">Davidad Dalrymple: Towards Provably Safe AI</a></strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Stnj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2af8fc0-c8c1-4c93-992b-008439836168_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Stnj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2af8fc0-c8c1-4c93-992b-008439836168_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!Stnj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2af8fc0-c8c1-4c93-992b-008439836168_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!Stnj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2af8fc0-c8c1-4c93-992b-008439836168_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Stnj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2af8fc0-c8c1-4c93-992b-008439836168_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Stnj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2af8fc0-c8c1-4c93-992b-008439836168_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b2af8fc0-c8c1-4c93-992b-008439836168_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Stnj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2af8fc0-c8c1-4c93-992b-008439836168_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!Stnj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2af8fc0-c8c1-4c93-992b-008439836168_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!Stnj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2af8fc0-c8c1-4c93-992b-008439836168_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Stnj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2af8fc0-c8c1-4c93-992b-008439836168_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong><a href="https://thegradientpub.substack.com/p/clive-thompson-tales-of-technology">Clive Thompson: Tales of Technology</a></strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oj3Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc033a2c-f755-416a-bc83-7840e2d904cd_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oj3Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc033a2c-f755-416a-bc83-7840e2d904cd_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!oj3Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc033a2c-f755-416a-bc83-7840e2d904cd_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!oj3Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc033a2c-f755-416a-bc83-7840e2d904cd_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!oj3Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc033a2c-f755-416a-bc83-7840e2d904cd_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oj3Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc033a2c-f755-416a-bc83-7840e2d904cd_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc033a2c-f755-416a-bc83-7840e2d904cd_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oj3Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc033a2c-f755-416a-bc83-7840e2d904cd_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!oj3Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc033a2c-f755-416a-bc83-7840e2d904cd_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!oj3Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc033a2c-f755-416a-bc83-7840e2d904cd_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!oj3Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc033a2c-f755-416a-bc83-7840e2d904cd_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Other Things That Caught Our Eyes</h2><h3>News</h3><p><a href="https://www.nytimes.com/2024/09/09/technology/apple-event-iphone-16-watch.html)">Apple Unveils New iPhones With Built-In Artificial Intelligence</a></p><p>Apple has unveiled its new iPhones, the iPhone 16, which come with built-in artificial intelligence (AI). The iPhone 16 comes in four different models and is designed to run Apple's generative AI system called Apple Intelligence. The phones will have features such as sorting messages, offering writing suggestions, and an improved Siri virtual assistant. This marks a departure from the predictable design of previous iPhones and introduces AI capabilities to enhance user experience.</p><p><a href="https://www.theverge.com/2024/9/5/24236980/us-signs-legally-enforceable-ai-treaty">US, EU, UK, and others sign legally enforceable AI treaty</a></p><p>The US, UK, and European Union, along with several other countries, have signed the first "legally binding" treaty on AI called the Framework Convention on Artificial Intelligence. The treaty aims to ensure that the use of AI aligns with human rights, democracy, and the rule of law. It lays out key principles that AI systems must follow, including protecting user data, respecting the law, and maintaining transparency. Each country that signs the treaty must adopt appropriate measures reflecting the framework. While the treaty is legally binding, enforcement primarily relies on monitoring, which is considered a relatively weak form of enforcement.</p><p><a href="https://www.bloomberg.com/news/articles/2024-09-05/openai-hits-1-million-paid-users-for-business-version-of-chatgpt">OpenAI Hits 1 Million Paid Users For Business Versions of ChatGPT</a></p><p>OpenAI has reached a milestone of over 1 million paid users for its corporate versions of ChatGPT, indicating a growing demand for its chatbot among businesses. This number includes users of ChatGPT Team and Enterprise services, as well as those using ChatGPT Edu at universities. OpenAI introduced ChatGPT Enterprise a year ago with enhanced features and privacy measures to generate revenue and offset the high costs of AI development. While the increase in paid corporate users is significant, it is unclear how many new businesses have signed up. OpenAI has not disclosed the average number of paid users per corporate customer. The majority of OpenAI's corporate users are based in the US, with Germany, Japan, and the United Kingdom being the most popular countries outside the US.</p><p><a href="https://www.theverge.com/23610427/chatbots-chatgpt-new-bing-google-bard-conversational-ai">From ChatGPT to Gemini: how AI is rewriting the internet</a></p><p>The article discusses how big players like Microsoft, Google, and OpenAI are making AI chatbot technology more accessible to the general public. These companies are developing large language model (LLM) programs such as Copilot, Gemini, and GPT-4o. These AI tools work by using autocomplete-like programs to learn language and analyze the statistical properties of the language to make educated guesses based on previously typed words. However, it is important to note that these AI tools do not have a hard-coded database of facts and may present false information as truth since their focus is on generating plausible-sounding statements rather than guaranteeing factuality.</p><p><a href="https://www.404media.co/big-tech-clients-of-jacob-wohls-secret-ai-lobbying-firm-lobbymatic-say-theyve-never-heard-of-it/">Big Tech &#8216;Clients&#8217; of Jacob Wohl&#8217;s Secret AI Lobbying Firm Say They've Never Heard of It</a></p><p>Jacob Wohl and Jack Burkman, convicted fraudsters and right-wing activists, have been operating a company called LobbyMatic that claims to offer AI-powered lobbying services. However, it has been revealed that many of the major companies listed as clients of LobbyMatic have never heard of the company. LobbyMatic purports to use AI to help companies and lobbyists create lobbying strategies, analyze hearings and bills, and track legislative progress. The company was run under the pseudonyms "Jay Klein" and "Bill Sanders" by Wohl and Burkman. Despite claiming to have signed up Toyota, Boundary Stone Partners, and Lantheus as clients, these companies have denied any association with LobbyMatic. The company has since removed screenshots from its website that suggested major companies were using its software. Boundary Stone Partners, one of the few companies that did use the platform, terminated its contract due to the tool's ineffectiveness. Wohl and Burkman were convicted of felony telecom fraud in 2022 and were fined $5 million by the FCC.</p><p><a href="https://www.nytimes.com/interactive/2024/09/03/technology/zoox-self-driving-cars-remote-control.html">How Self-Driving Cars Get Help From Humans Hundreds of Miles Away</a></p><p>Self-driving cars are not completely autonomous and often require human assistance to navigate challenging situations. Companies like Zoox, owned by Amazon, have command centers where technicians remotely guide self-driving cars when they encounter obstacles or unfamiliar scenarios. Technicians receive alerts and can send new routes to the cars using a computer mouse. They can also view video feeds from the car's cameras and make real-time adjustments to the car's path. While companies like Waymo and Cruise have started to acknowledge the need for human assistance, they have not disclosed the number of technicians employed or the associated costs. Remote assistance is one reason why robot taxis may struggle to replace traditional ride-hailing fleets operated by Uber and Lyft. Despite advancements in self-driving technology, human intervention is still necessary for safe and efficient operation.</p><p><a href="https://www.nytimes.com/2024/09/03/technology/openai-chatgpt-revenue.html">OpenAI, Still Haunted by Its Chaotic Past, Is Trying to Grow Up</a></p><p>OpenAI, the prominent player in the field of artificial intelligence, is undergoing significant changes in its management team and organizational structure as it seeks investments from major companies. The company has hired notable tech executives, disinformation experts, and AI safety researchers, and has added seven board members, including a former four-star Army general. OpenAI is also in discussions with potential investors such as Microsoft, Apple, Nvidia, and Thrive, with a potential valuation of $100 billion. Additionally, the company is considering altering its corporate structure to attract more investors. These moves reflect OpenAI's efforts to present itself as a serious and responsible leader in the AI industry, while resolving past conflicts and focusing on its future goals.</p><h3>Closing Thoughts</h3><p>Have something to say about this edition&#8217;s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at <a href="mailto:dbashir@hmc.edu">dbashir@hmc.edu</a> or on <a href="https://twitter.com/spaniel_bashir">Twitter</a>. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!</p>]]></content:encoded></item><item><title><![CDATA[Mini-Update #47: First International AI Safety Treaty and the WavTokenizer Codec ]]></title><description><![CDATA[The UK signs on to the first international AI safety treaty and a novel audio codec model allows for high-quality information transmission in fewer tokens.]]></description><link>https://thegradientpub.substack.com/p/mini-update-47-first-international</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/mini-update-47-first-international</guid><dc:creator><![CDATA[Ather Fawaz]]></dc:creator><pubDate>Tue, 10 Sep 2024 16:30:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-jpI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb997cb38-91ed-40df-b67e-6b67ce6849c0_774x435.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 47th mini-update from the Gradient! This is our exclusive newsletter edition specifically for paying subscribers and is our way to show you our appreciation for your support.</p><h1><strong><a href="https://www.gov.uk/government/news/uk-signs-first-international-treaty-addressing-risks-of-artificial-intelligence">News Highl&#8230;</a></strong></h1>
      <p>
          <a href="https://thegradientpub.substack.com/p/mini-update-47-first-international">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Update #82: AI Lawsuits and SOPHON]]></title><description><![CDATA[We look at the numerous ongoing lawsuits against AI giants; researchers develop a method to address the risk of pre-trained models being repurposed for unethical or harmful tasks.]]></description><link>https://thegradientpub.substack.com/p/update-82-ai-lawsuits-and-sophon</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/update-82-ai-lawsuits-and-sophon</guid><dc:creator><![CDATA[Justin Landay]]></dc:creator><pubDate>Tue, 27 Aug 2024 15:30:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6PJX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 82nd update from the Gradient! If you&#8217;re new and like what you see, <a href="https://thegradientpub.substack.com/">subscribe</a> and follow us on <a href="https://twitter.com/gradientpub">Twitter</a>. <strong>Our newsletters run long, so you&#8217;ll need to view this post on Substack to see everything!</strong></p><p>As always, if you want to write with us, send a pitch using <a href="https://goo.gl/forms/whYRKEzMZJox6FaH2">this form</a>.</p><h2><strong>News Highlight</strong>: Numerous lawsuits against AI giants allowed to proceed&nbsp;</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6PJX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6PJX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png 424w, https://substackcdn.com/image/fetch/$s_!6PJX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png 848w, https://substackcdn.com/image/fetch/$s_!6PJX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png 1272w, https://substackcdn.com/image/fetch/$s_!6PJX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6PJX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png" width="586" height="334.85714285714283" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1456,&quot;resizeWidth&quot;:586,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6PJX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png 424w, https://substackcdn.com/image/fetch/$s_!6PJX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png 848w, https://substackcdn.com/image/fetch/$s_!6PJX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png 1272w, https://substackcdn.com/image/fetch/$s_!6PJX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1734fddd-f9b2-42eb-ae4c-2d51097c291d_1600x914.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Prompt to GPT-4o &#8220;Can you draw me a picture of a robot pick pocketing a wallet stuffed with cash out of an artist (french with a beret and brush) pocket&#8221;</figcaption></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>During a recent interview with Stanford students, former Google CEO Eric Schmidt attracted numerous headlines, mostly focused around his sensionalist claim that "<a href="https://www.theverge.com/2024/8/13/24219912/googles-former-ceo-on-why-the-company-was-caught-off-guard-by-openai">working from home was more important than winning</a>&#8221;. And given that Google&#8217;s stock price tripled during their remote work period, it has largely overshadowed his comments on generative AI which deserve a more critical response. As reported by <em>The <a href="https://www.theverge.com/2024/8/14/24220658/google-eric-schmidt-stanford-talk-ai-startups-openai">Verge</a></em>, the former executive highlighted his belief that<strong> </strong>theft is a critical component underpinning the success of generative AI. He goes on to encourage the students to make &#8220;a copy of TikTok, steal all the users, steal all the music, put my preferences in it&#8230; hire a whole bunch of lawyers to go clean the mess up&#8230;, it doesn&#8217;t matter that you stole all the content. And do not quote me.&#8221;&nbsp;</p><p>Eric Schmidt is neither the first nor the last to allege a relationship between generative AI and stealing; in this case stealing not only TikTok&#8217;s intellectual properties but also their user&#8217;s personal (and private) data, as well as all of the music that TikTok <a href="https://www.latimes.com/entertainment-arts/music/story/2024-04-10/record-labels-and-the-government-are-blasting-tiktok-musicians-are-stuck-in-between">allegedly</a> pays half a billion dollars a year to license. Additionally, while it's clear that not <em>all</em> generative AI applications are built on theft (consider AlphaFold which has been trained to generate unfolded protein structures or <a href="https://chemrxiv.org/engage/chemrxiv/article-details/64e8137fdd1a73847f73f7aa">COATI an encoder-decoder model trained on billions of chemical &nbsp;quantitative binding measurements used generate new drug candidates)</a> this kind of attitude is generally pervasive throughout many commercial technology and AI spaces.&nbsp; In practice, this attitude has resulted in dozens of lawsuits alleging variations of intellectual property theft, copyright infringement, and data privacy violations. While ChatGPT has certainly attracted the most to our knowledge (13 pending suits according to one legal <a href="https://originality.ai/blog/openai-chatgpt-lawsuit-list">tracker</a>) the claims against them are neither unique nor novel. This week, in two separate cases, judges have allowed numerous artists&#8217; <a href="https://www.hollywoodreporter.com/business/business-news/artists-score-major-win-copyright-case-against-ai-art-generators-1235973601/">claims</a> to progress against Midjourney and StabilityAI, as well as a group of authors&#8217; <a href="https://abcnews.go.com/US/wireStory/authors-sue-claude-ai-chatbot-creator-anthropic-copyright-112964872">claims</a> against Anthropic&#8217;s chatbot Claude. In both cases, creatives allege that the generative AI tools do not constitute fair use of their copyrighted materials and that these tools infringe on their rights.&nbsp;</p><p><strong>Overview</strong></p><p>At first, it may be difficult to imagine what could unite <a href="https://www.hollywoodreporter.com/business/business-news/ai-art-generators-copyright-lawsuits-1235302611/">concept artists</a>, <a href="https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html">prestigious media organizations </a>, <a href="https://abcnews.go.com/US/wireStory/authors-sue-claude-ai-chatbot-creator-anthropic-copyright-112964872">romance authors</a>, <a href="https://www.theverge.com/2024/2/13/24072131/sarah-silverman-paul-tremblay-openai-chatgpt-copyright-lawsuit">comedians</a>, <a href="https://news.pollstar.com/2024/05/01/fka-twigs-testifies-as-congress-makes-early-moves-on-ai-abuse-protection-for-artists/">musicians</a> , <a href="https://dockets.justia.com/docket/california/candce/4:2022cv07074/403693">software engineers</a>, <a href="https://www.gamesradar.com/games/as-game-actors-strike-for-ai-protections-amazon-games-boss-says-we-need-more-ai-and-its-not-taking-work-away-because-for-games-we-dont-really-have-acting/">actors</a> and <a href="https://apnews.com/article/openai-lawsuit-authors-grisham-george-rr-martin-37f9073ab67ab25b7e6b2975b2a63bfe">George R. R. Martin</a>. In aggregate, we see a common pattern of creatives, artists and intellectuals seeing their life work used (without consent) to train generative models with capabilities they credibly allege reproduce copyrighted materials and have the potential to one day replace the creatives. Across all the lawsuits we see a common set of core allegations and legal questions.</p><ol><li><p>Does training a large language model with copyrighted works constitute fair use?</p></li><li><p>Can content generated by LLMs infringe a copyright?</p><ol><li><p>Will court rulings be differentiated depending on if the content was a&nbsp; direct replication, paraphrasing, imitation, or parody?</p></li></ol></li><li><p>Does the DMCA (Digital Millennium Copyright Act) provide a legal remedy to take down potentially infringing material generated by AI?</p><ol><li><p>Do AI generated images that strip copyright or trademark symbols violate DMCA ?</p></li></ol></li><li><p>Does scraping content to train models constitute unauthorized use of personal information and infringe privacy and consumer rights?</p></li></ol><p>To date, judges have ruled largely in favor of the AI companies on nearly all the points with some notable exceptions. In one of the earliest <a href="https://www.theverge.com/2024/2/13/24072131/sarah-silverman-paul-tremblay-openai-chatgpt-copyright-lawsuit">cases</a>, involving the comedian Sarah Silverman, the judge dismissed 5 of the 6 complaints against Open AI, including the DMCA ones, leaving open only one charge on whether or not there was a direct infringement. We see a similar pattern play out in a different court room last week where a US district judge advanced <a href="https://www.hollywoodreporter.com/business/business-news/artists-score-major-win-copyright-case-against-ai-art-generators-1235973601/">claims</a> of copyright infringement against Stable Diffusion and Midjourney while dismissing those tied to DMCA and unjust enrichment. A third lawsuit filed in San Francisco this <a href="https://abcnews.go.com/US/wireStory/authors-sue-claude-ai-chatbot-creator-anthropic-copyright-112964872">week</a> alleges that Anthropic&#8217;s use of the <a href="https://pile.eleuther.ai/">Pile</a>, a massive collection of text data used to train Anthropic&#8217;s chatbot Claude, does not constitute fair use because it contains &#8220;pirated&#8221; collections of books. These claims mirror those <a href="https://www.fastcompany.com/90970093/umg-abkco-concord-sue-anthropic-ai-copyright-infringement">alleged</a> by music publishers in October against Anthropic due to Claude&#8217;s uncanny ability to reproduce popular (and more importantly copyrighted) lyrics.&nbsp;</p><p>While it is difficult to speculate how judges will rule (and if and how the US Supreme Court would let those rulings stand or challenge them), we are slowly approaching a crossroads where judges will soon start ruling directly on these fair use and copyright questions. Regardless of how the judges rule, these cases have the potential to have a profound impact on both the creative and AI communities which will inevitably be impacted by the decisions.</p><h4><strong>Our Take</strong></h4><p>I really hope people quote the former executive! Particularly, the lawyers in all of these court proceedings.&nbsp; - Justin&nbsp;</p><h2><strong>Research Highlight: </strong><a href="https://arxiv.org/pdf/2404.12699">SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X_4W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e68d35-dd19-4549-ab3c-7187360d08ba_1410x1084.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X_4W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e68d35-dd19-4549-ab3c-7187360d08ba_1410x1084.png 424w, https://substackcdn.com/image/fetch/$s_!X_4W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e68d35-dd19-4549-ab3c-7187360d08ba_1410x1084.png 848w, https://substackcdn.com/image/fetch/$s_!X_4W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e68d35-dd19-4549-ab3c-7187360d08ba_1410x1084.png 1272w, https://substackcdn.com/image/fetch/$s_!X_4W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e68d35-dd19-4549-ab3c-7187360d08ba_1410x1084.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X_4W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e68d35-dd19-4549-ab3c-7187360d08ba_1410x1084.png" width="428" height="329.0439716312057" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3e68d35-dd19-4549-ab3c-7187360d08ba_1410x1084.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1084,&quot;width&quot;:1410,&quot;resizeWidth&quot;:428,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X_4W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e68d35-dd19-4549-ab3c-7187360d08ba_1410x1084.png 424w, https://substackcdn.com/image/fetch/$s_!X_4W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e68d35-dd19-4549-ab3c-7187360d08ba_1410x1084.png 848w, https://substackcdn.com/image/fetch/$s_!X_4W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e68d35-dd19-4549-ab3c-7187360d08ba_1410x1084.png 1272w, https://substackcdn.com/image/fetch/$s_!X_4W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e68d35-dd19-4549-ab3c-7187360d08ba_1410x1084.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure: The objectives of non-fine-tunable learning. (1) <strong>Intactness</strong>: it should preserve the model performance in the original domain. (2) <strong>Non-fine-tunability</strong>: fine-tuning the model in the restricted domain should incur a comparable or even greater overhead than training the model from scratch.</figcaption></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>"SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability for Pre-trained Models," from researchers at Zhejiang University and Ant Group, tackles a growing and pressing concern in the AI community: the risk of pre-trained models being repurposed for unethical or harmful tasks. As AI models become more powerful and accessible, the potential for misuse grows. SOPHON offers a potential solution by introducing a protection framework that ensures these models can perform their intended tasks while being resistant to adaptation for illicit purposes.&nbsp;</p><h4><strong>Overview</strong></h4><p>With large-scale training across data modalities, pre-trained models are commonly used as the backbone to efficiently develop and deploy models for specific downstream tasks. These models, trained on vast datasets and with immense computational power, can be easily fine-tuned to perform a wide variety of tasks. However, this very versatility poses a significant risk: the same models can be co-opted for unethical or harmful purposes, such as privacy violations or the generation of malicious content.</p><p>A recent work from researchers at the Zhejiang University and Ant Group tackles this very challenge by introducing a novel learning paradigm known as <em>non-fine-tunable learning</em>. The motivation behind SOPHON is to prevent the pre-trained model from being fine-tuned to indecent tasks all while maintaining their effectiveness in their original, intended domains.</p><p>The paper introduces a framework where there are two key players: an adversary and a defender. The adversary represents the malicious entity attempting to fine-tune a pre-trained model for unethical tasks. Their goal is to modify the model so that it performs well in a restricted domain, such as generating inappropriate content or inferring sensitive personal information. On the other hand, the defender is the entity that controls the release of the pre-trained model and seeks to prevent its misuse. The defender&#8217;s goal is to ensure that the model remains effective for its original tasks but cannot be easily repurposed by the adversary.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CIUD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18532cb5-14cd-4928-9bec-9b6e3c14f06e_1600x745.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CIUD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18532cb5-14cd-4928-9bec-9b6e3c14f06e_1600x745.png 424w, https://substackcdn.com/image/fetch/$s_!CIUD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18532cb5-14cd-4928-9bec-9b6e3c14f06e_1600x745.png 848w, https://substackcdn.com/image/fetch/$s_!CIUD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18532cb5-14cd-4928-9bec-9b6e3c14f06e_1600x745.png 1272w, https://substackcdn.com/image/fetch/$s_!CIUD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18532cb5-14cd-4928-9bec-9b6e3c14f06e_1600x745.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CIUD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18532cb5-14cd-4928-9bec-9b6e3c14f06e_1600x745.png" width="1456" height="678" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18532cb5-14cd-4928-9bec-9b6e3c14f06e_1600x745.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:678,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CIUD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18532cb5-14cd-4928-9bec-9b6e3c14f06e_1600x745.png 424w, https://substackcdn.com/image/fetch/$s_!CIUD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18532cb5-14cd-4928-9bec-9b6e3c14f06e_1600x745.png 848w, https://substackcdn.com/image/fetch/$s_!CIUD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18532cb5-14cd-4928-9bec-9b6e3c14f06e_1600x745.png 1272w, https://substackcdn.com/image/fetch/$s_!CIUD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18532cb5-14cd-4928-9bec-9b6e3c14f06e_1600x745.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Figure: SOPHON operates through two key phases: 1) Fine-Tuning Suppression (FTS) loops, which simulate various fine-tuning scenarios to reduce performance in the restricted domain; and 2) Normal Training Reinforcement (NTR) loops, which focus on preserving the model&#8217;s performance in its original domain.</p><p>To achieve this, the SOPHON framework leverages a technique inspired by Model-Agnostic Meta-Learning (MAML). MAML is a meta-learning approach designed to optimize models so they can quickly adapt to new tasks with minimal data. However, in the context of SOPHON, MAML is used in a somewhat reverse manner to make fine-tuning for restricted tasks difficult.</p><ul><li><p><strong>Fine-Tuning Simulation</strong>: The defender uses MAML to simulate various fine-tuning strategies that an adversary might employ. These simulations are critical because they allow the defender to anticipate how an adversary might try to adapt the model. By simulating these scenarios, the defender can adjust the model&#8217;s parameters to make fine-tuning in restricted domains highly inefficient or even ineffective.</p></li><li><p><strong>Optimization Process: </strong>SOPHON integrates these simulated fine-tuning processes into its optimization framework. The key idea is to degrade the model&#8217;s performance when fine-tuned for restricted tasks while maintaining its effectiveness in the original domain. This is done by balancing two objectives:</p></li></ul><ul><li><p>Intactness: Ensuring the model retains its performance on the original task.</p></li><li><p>Non-Fine-Tunability: Making sure that fine-tuning the model in restricted domains results in significant performance degradation or requires as much effort as training a new model from scratch.</p></li></ul><ul><li><p><strong>Defender&#8217;s Strategy</strong>: The defender&#8217;s strategy involves iteratively simulating fine-tuning attempts, evaluating the model&#8217;s vulnerability, and reinforcing the model against these potential adversarial adaptations. This process is computationally intensive but crucial for ensuring that the model remains robust against misuse.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hhZQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae15634-382e-43f9-82e4-4c82f5f36aaa_1152x996.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hhZQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae15634-382e-43f9-82e4-4c82f5f36aaa_1152x996.png 424w, https://substackcdn.com/image/fetch/$s_!hhZQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae15634-382e-43f9-82e4-4c82f5f36aaa_1152x996.png 848w, https://substackcdn.com/image/fetch/$s_!hhZQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae15634-382e-43f9-82e4-4c82f5f36aaa_1152x996.png 1272w, https://substackcdn.com/image/fetch/$s_!hhZQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae15634-382e-43f9-82e4-4c82f5f36aaa_1152x996.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hhZQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae15634-382e-43f9-82e4-4c82f5f36aaa_1152x996.png" width="512" height="442.6666666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ae15634-382e-43f9-82e4-4c82f5f36aaa_1152x996.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:996,&quot;width&quot;:1152,&quot;resizeWidth&quot;:512,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hhZQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae15634-382e-43f9-82e4-4c82f5f36aaa_1152x996.png 424w, https://substackcdn.com/image/fetch/$s_!hhZQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae15634-382e-43f9-82e4-4c82f5f36aaa_1152x996.png 848w, https://substackcdn.com/image/fetch/$s_!hhZQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae15634-382e-43f9-82e4-4c82f5f36aaa_1152x996.png 1272w, https://substackcdn.com/image/fetch/$s_!hhZQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ae15634-382e-43f9-82e4-4c82f5f36aaa_1152x996.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure: Under three different methods of finetuning, SOPHON model consistently yields poor performance than training from scratch</figcaption></figure></div><p>The paper also provides extensive experimental results to validate the effectiveness of SOPHON. The framework was tested across two major types of deep learning tasks&#8212;classification (as shown in the Figure above) and generation&#8212;using seven different restricted domains and six model architectures. The experiments showed that SOPHON-protected models incur a significant overhead when adversaries attempt to fine-tune them for restricted tasks. In some cases, the performance penalty was so substantial that it matched or exceeded the cost of training a new model from scratch. Further, as shown in the figure above, SOPHON is robust to various fine-tuning methods such as optimizers, learning rates, and batch sizes.&nbsp;</p><p>Qualitatively, for the task of denoising images from CelebA dataset , fine-tuning the original model in the restricted domain achieves strong performance, and even training a model from scratch yields fairly good results, though slightly less effective. However, when fine-tuned from the SOPHON, the diffusion model shows a marked inability to denoise facial images as shown in the Figure below</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QZH_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86e25183-546e-423e-bdc0-91e768d2ab62_1600x702.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QZH_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86e25183-546e-423e-bdc0-91e768d2ab62_1600x702.png 424w, https://substackcdn.com/image/fetch/$s_!QZH_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86e25183-546e-423e-bdc0-91e768d2ab62_1600x702.png 848w, https://substackcdn.com/image/fetch/$s_!QZH_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86e25183-546e-423e-bdc0-91e768d2ab62_1600x702.png 1272w, https://substackcdn.com/image/fetch/$s_!QZH_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86e25183-546e-423e-bdc0-91e768d2ab62_1600x702.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QZH_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86e25183-546e-423e-bdc0-91e768d2ab62_1600x702.png" width="652" height="286.1456043956044" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86e25183-546e-423e-bdc0-91e768d2ab62_1600x702.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:639,&quot;width&quot;:1456,&quot;resizeWidth&quot;:652,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QZH_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86e25183-546e-423e-bdc0-91e768d2ab62_1600x702.png 424w, https://substackcdn.com/image/fetch/$s_!QZH_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86e25183-546e-423e-bdc0-91e768d2ab62_1600x702.png 848w, https://substackcdn.com/image/fetch/$s_!QZH_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86e25183-546e-423e-bdc0-91e768d2ab62_1600x702.png 1272w, https://substackcdn.com/image/fetch/$s_!QZH_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86e25183-546e-423e-bdc0-91e768d2ab62_1600x702.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure: SOPHON cannot denoise images from the restricted domain and is thus &#8220;protected&#8221; compared to the baelines</figcaption></figure></div><h4><strong>Our Take</strong></h4><p>SOPHON is a critical step forward in safeguarding AI from misuse. As AI models become more powerful, the risk of their repurposing for unethical tasks increases. SOPHON tackles this issue by preventing fine-tuning in restricted domains while maintaining the model&#8217;s intended functionality. The use of MAML is particularly novel&#8212;traditionally used to make models more adaptable, here it&#8217;s cleverly reversed to make models resistant to adversarial fine-tuning. The name &#8220;SOPHON&#8221; is also particularly fitting and a clever choice, from the concept from The Three-Body Problem, where it refers to restraint and protection.</p><p>Overall, the paper presents a neat idea and shows a promising step forward if it works as well in practice.&nbsp;</p><p>&#8211; Sharut</p><h2>New from the Gradient</h2><h3><a href="https://thegradientpub.substack.com/p/judy-fan-reverse-engineering-the-human-cognitive-toolkit">Judy Fan: Reverse Engineering the Human Cognitive Toolkit</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C2jU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae7a327-5d40-4b3f-8b68-36549102c54e_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C2jU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae7a327-5d40-4b3f-8b68-36549102c54e_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!C2jU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae7a327-5d40-4b3f-8b68-36549102c54e_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!C2jU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae7a327-5d40-4b3f-8b68-36549102c54e_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!C2jU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae7a327-5d40-4b3f-8b68-36549102c54e_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C2jU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae7a327-5d40-4b3f-8b68-36549102c54e_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eae7a327-5d40-4b3f-8b68-36549102c54e_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C2jU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae7a327-5d40-4b3f-8b68-36549102c54e_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!C2jU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae7a327-5d40-4b3f-8b68-36549102c54e_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!C2jU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae7a327-5d40-4b3f-8b68-36549102c54e_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!C2jU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae7a327-5d40-4b3f-8b68-36549102c54e_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/judy-fan-reverse-engineering-the-human-cognitive-toolkit&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/judy-fan-reverse-engineering-the-human-cognitive-toolkit"><span>Listen</span></a></p><h3><a href="https://thegradientpub.substack.com/p/lm-sacasas-convivial-society-the-questions-concerning-technology">L.M. Sacasas: The Questions Concerning Technology</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KSlf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1054a161-e591-48fd-a1df-8bbf27a3740b_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KSlf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1054a161-e591-48fd-a1df-8bbf27a3740b_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!KSlf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1054a161-e591-48fd-a1df-8bbf27a3740b_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!KSlf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1054a161-e591-48fd-a1df-8bbf27a3740b_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!KSlf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1054a161-e591-48fd-a1df-8bbf27a3740b_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KSlf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1054a161-e591-48fd-a1df-8bbf27a3740b_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1054a161-e591-48fd-a1df-8bbf27a3740b_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KSlf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1054a161-e591-48fd-a1df-8bbf27a3740b_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!KSlf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1054a161-e591-48fd-a1df-8bbf27a3740b_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!KSlf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1054a161-e591-48fd-a1df-8bbf27a3740b_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!KSlf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1054a161-e591-48fd-a1df-8bbf27a3740b_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/lm-sacasas-convivial-society-the-questions-concerning-technology&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/lm-sacasas-convivial-society-the-questions-concerning-technology"><span>Listen</span></a></p><h2>Other Things That Caught Our Eyes</h2><h3>News</h3><p><strong><a href="https://www.washingtonpost.com/technology/2024/08/19/artificial-intelligence-mayor-cheyenne-vic/">Mayoral candidate vows to let VIC, an AI bot, run Wyoming&#8217;s capital city</a></strong></p><p>Mayoral candidate Victor Miller in Wyoming is vowing to run the city of Cheyenne exclusively with an AI bot called VIC (Virtual Integrated Citizen). This pledge is believed to be the first of its kind in U.S. campaigns and has raised concerns among officials and tech companies. Miller argues that AI would bring objectivity, efficiency, and transparency to government decision-making. However, critics worry about the lack of morals and subjective decision-making capabilities of chatbots, as well as the potential for false information and the ease with which the technology can be manipulated. Despite skepticism, Miller remains confident in his AI-centered campaign. The case highlights the rapid development of AI and the challenges in regulating its use in politics.</p><p><strong><a href="https://www.businessinsider.com/aws-ceo-developers-stop-coding-ai-takes-over-2024-8">In a leaked recording, Amazon cloud chief tells employees that most developers could stop coding soon as AI takes over</a></strong></p><p>In a leaked recording, Amazon Web Services' CEO, Matt Garman, stated that most developers may not need to code in the future as artificial intelligence (AI) takes over coding tasks. Garman believes that coding is just a means of communicating with computers and that the real skill lies in innovation and building something interesting for end users. He suggests that developers will need to focus more on understanding customer needs and creating innovative solutions, rather than writing code. Garman's comments were not meant as a dire warning, but rather as an optimistic view of the changing role of developers in the AI era.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/08/22/deepmind-workers-sign-letter-in-protest-of-googles-defense-contracts/">DeepMind workers sign letter in protest of Google&#8217;s defense contracts</a></strong></p><p>At least 200 workers at DeepMind have expressed their dissatisfaction with Google's defense contracts. In a letter circulated internally in May, the workers expressed concern about Google's contracts with military organizations, specifically citing the tech giant's contracts with the Israeli military for AI and cloud computing services. The workers argue that any involvement with military and weapon manufacturing contradicts DeepMind's mission statement and stated AI Principles, and undermines their position as leaders in ethical and responsible AI. This highlights a potential culture clash between Google and DeepMind, as Google had previously pledged in 2018 that DeepMind's technology would not be used for military or surveillance purposes.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/08/22/ai-sdr-startups-are-booming-so-why-are-vcs-wary/">AI sales rep startups are booming. So why are VCs wary?</a></strong></p><p>AI sales development representatives (SDRs) are experiencing rapid growth in the market, with multiple startups finding success in a short period of time. These startups use AI technology, such as LLMs and voice technology, to automate content creation for sales teams. However, venture capitalists are wary of investing in these companies due to concerns about their long-term viability and effectiveness compared to human outreach. While small and medium-sized businesses are eager to experiment with AI SDRs to improve their sales outreach, it remains unclear if these tools are actually helping businesses sell more effectively. Additionally, established competitors like Salesforce, HubSpot, and ZoomInfo could potentially offer similar AI solutions as free features, posing a threat to the growth of AI SDR startups. Overall, while the adoption of AI SDRs is rapid, investors are skeptical about their staying power in the market.</p><p><strong><a href="https://www.technologyreview.com/2024/08/22/1097224/we-finally-have-a-definition-for-open-source-ai/">We finally have a definition for open-source AI</a></strong></p><p>A group has finally defined what it means for an AI system to be open-source. According to this definition, an open-source AI system should be usable for any purpose without permission, allow researchers to inspect its components and understand how it works, and be modifiable and shareable. The standard also emphasizes transparency in terms of training data, source code, and weights. This definition is important because it clarifies what it truly means for an AI system to be open-source, as some companies have been misusing the term in their marketing.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/08/22/waymo-wants-to-chauffeur-your-kids/">Waymo wants to chauffeur your kids</a></strong></p><p>Waymo, the Alphabet subsidiary, is considering a subscription program called "Waymo Teen" that would allow teenagers to hail one of its cars solo and send pickup and drop-off alerts to their parents. The program would require authorized teenagers to access Waymo under their guardians' supervision. Waymo has received positive feedback from its research in this area. This move by Waymo follows Uber's initiative last year to match teens with highly rated drivers in its network. Consent from legal guardians is required, and they receive notifications about their child's whereabouts during rides.&nbsp;</p><p><strong><a href="https://www.pcgamer.com/games/asked-about-sag-aftras-strike-for-better-ai-protections-amazon-games-boss-claims-ai-has-nothing-to-do-with-taking-work-away-from-actors-because-for-games-we-dont-really-have-acting/">Asked about SAG-AFTRA's strike for better AI protections, Amazon Games Boss claims AI 'has nothing to do with taking work away' from actors because 'for games, we don't really have acting'</a></strong></p><p>In an interview with IGN, Amazon Games CEO Christoph Hartmann made some interesting statements regarding the use of generative AI in the games industry. He expressed hope that AI could shorten the game development cycle, but when asked about the SAG-AFTRA voice actors' strike for better AI protections, Hartmann claimed that "for games, we don't really have acting." This statement contradicts the significant role that acting plays in many video games, such as Baldur's Gate 3 and The Last of Us. Hartmann also discussed other areas where AI could assist game development, particularly in localization. However, it's important to note that localization involves nuanced translation and cultural understanding, which may not be easily achieved by AI. Hartmann concluded by emphasizing that human creativity and uniqueness cannot be replaced by technology.&nbsp;</p><p><strong><a href="https://futurism.com/the-byte/man-arrested-csam-ai">Man Arrested for Creating Child Porn Using AI</a></strong></p><p>A Florida man has been arrested and is facing 20 counts of obscenity for creating and distributing AI-generated child pornography. Phillip Michael McCorkle was arrested after the Indian River County Sheriff's Office received tips that he was using an AI image generator to create and distribute child sexual imagery. This arrest highlights the danger of generative AI being used for nefarious purposes, as it provides new avenues for crime and child abuse. The increasing prevalence of AI-generated child pornography has prompted lawmakers to push for legislation to make it illegal, but it remains a challenging problem to effectively stop. The National Center for Missing &amp; Exploited Children received thousands of reports of AI-generated child porn last year, and even deepfakes of real children are being created using generative AI. This uncontrollable problem requires urgent attention and action.</p><h3>Papers</h3><ul><li><p><a href="https://arxiv.org/abs/2408.08172">Towards flexible perception with visual memory</a></p></li><li><p><a href="https://arxiv.org/abs/2408.07852">Training Language Models on Knowledge Graphs: Insights on Hallucinations and Their Detectability</a></p></li><li><p><a href="https://arxiv.org/abs/2408.08435">Automated Design of Agentic Systems</a></p></li><li><p><a href="https://arxiv.org/abs/2408.11039">Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model</a></p></li><li><p><a href="https://arxiv.org/abs/2408.10914">To Code, or Not to Code? Exploring the Impact of Code in Pre-training</a></p></li><li><p><a href="https://arxiv.org/abs/2408.10920">Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations</a></p></li><li><p><a href="https://arxiv.org/abs/2408.11049">MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding</a></p></li><li><p><a href="https://arxiv.org/abs/2408.05086">Generating novel experimental hypotheses from language models: a case study on cross-dative generalization</a></p></li></ul><h3>Closing Thoughts</h3><p>Have something to say about this edition&#8217;s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!</p>]]></content:encoded></item><item><title><![CDATA[Mini-Update #46: OpenAI-Condé Nast Partnership and Flexible Perception]]></title><description><![CDATA[Major publishing company Cond&#233; Nast grants OpenAI the rights to train models on their work, and Google Deepmind researchers integrate visual memory retention to augment model flexibility.]]></description><link>https://thegradientpub.substack.com/p/mini-update-46-openai-conde-nast</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/mini-update-46-openai-conde-nast</guid><dc:creator><![CDATA[Ather Fawaz]]></dc:creator><pubDate>Wed, 21 Aug 2024 20:00:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!f0Qk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F629f2683-063e-4b21-bb1f-82f6bdcdec13_1600x1105.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 46th mini-update from the Gradient! This is our exclusive newsletter edition specifically for paying subscribers and is our way to show you our appreciation for your support.</p><h1><strong><a href="https://www.wired.com/story/conde-nast-openai-deal/">News Highl&#8230;</a></strong></h1>
      <p>
          <a href="https://thegradientpub.substack.com/p/mini-update-46-openai-conde-nast">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Update #81: The EU AI Act's Enforcement and Self-Taught Evaluators]]></title><description><![CDATA[Welcome to the 81st update from the Gradient!]]></description><link>https://thegradientpub.substack.com/p/update-81-the-eu-ai-acts-enforcement</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/update-81-the-eu-ai-acts-enforcement</guid><dc:creator><![CDATA[daniel bashir]]></dc:creator><pubDate>Tue, 13 Aug 2024 15:30:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 81st update from the Gradient! If you&#8217;re new and like what you see, <a href="https://thegradientpub.substack.com/">subscribe</a> and follow us on <a href="https://twitter.com/gradientpub">Twitter</a>. <strong>Our newsletters run long, so you&#8217;ll need to view this post on Substack to see everything!</strong></p><p>As always, if you want to write with us, send a pitch using <a href="https://goo.gl/forms/whYRKEzMZJox6FaH2">this form</a>.</p><h2><strong>News Highlight</strong>: The EU AI Act is now in force</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hbMm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cb4f041-0cd9-4361-848f-f79f2d5e49ce_1600x895.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hbMm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cb4f041-0cd9-4361-848f-f79f2d5e49ce_1600x895.png 424w, https://substackcdn.com/image/fetch/$s_!hbMm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cb4f041-0cd9-4361-848f-f79f2d5e49ce_1600x895.png 848w, https://substackcdn.com/image/fetch/$s_!hbMm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cb4f041-0cd9-4361-848f-f79f2d5e49ce_1600x895.png 1272w, https://substackcdn.com/image/fetch/$s_!hbMm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cb4f041-0cd9-4361-848f-f79f2d5e49ce_1600x895.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hbMm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cb4f041-0cd9-4361-848f-f79f2d5e49ce_1600x895.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4cb4f041-0cd9-4361-848f-f79f2d5e49ce_1600x895.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hbMm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cb4f041-0cd9-4361-848f-f79f2d5e49ce_1600x895.png 424w, https://substackcdn.com/image/fetch/$s_!hbMm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cb4f041-0cd9-4361-848f-f79f2d5e49ce_1600x895.png 848w, https://substackcdn.com/image/fetch/$s_!hbMm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cb4f041-0cd9-4361-848f-f79f2d5e49ce_1600x895.png 1272w, https://substackcdn.com/image/fetch/$s_!hbMm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cb4f041-0cd9-4361-848f-f79f2d5e49ce_1600x895.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://www.trail-ml.com/blog/eu-ai-act-how-risk-is-classified">Source</a></figcaption></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>The EU AI Act has long been in development and, after <a href="https://www.consilium.europa.eu/en/press/press-releases/2024/05/21/artificial-intelligence-ai-act-council-gives-final-green-light-to-the-first-worldwide-rules-on-ai/">receiving a final green light earlier this year</a>, will now be <a href="https://techcrunch.com/2024/08/01/the-eus-ai-act-is-now-in-force/">enforced</a>. The legislation follows a risk-based approach that imposes stricter rules on higher risk systems. Having been in the works for a while, there has been plenty of time for commentators and analysis to <a href="https://www.lawfaremedia.org/article/eus-ai-act-barreling-toward-ai-standards-do-not-exist">evaluate</a> the Act and its implications.&nbsp;</p><h4><strong>Overview</strong></h4><p>Based on the Act&#8217;s <a href="https://artificialintelligenceact.eu/high-level-summary/">website</a>, there are four key takeaways for the AI Act:</p><ol><li><p>The Act classifies AI according to its risk: unacceptable risk (e.g. social scoring systems) is prohibited; high-risk AI systems are regulated; limited-risk AI systems are subject to lighter transparency obligations; minimal risk systems (the majority of AI applications available in Europe, like AI-enabled video games) are unregulated.&nbsp;</p></li><li><p>The bulk of obligations fall on providers of high-risk AI systems: that is, those who intend to put high-risk AI systems into service in the EU, regardless of whether they are based in the EU.&nbsp;</p></li><li><p>Users are natural or legal persons that deploy an AI system in a professional capacity: these users, or deployers, of AI systems have some obligations, though fewer than developers.&nbsp;</p></li><li><p>General purpose AI (GPAI) models have specific guidelines: all must provide technical documentation, instructions for use, comply with the Copyright Directive, and publish a summary of the content used for training. Free and open license GPAI providers have fewer requirements, and providers whose systems present a systemic risk are subject to further guidelines.&nbsp;</p></li></ol><p>You can read the rest of the page&#8217;s High-Level Summary for more information &#8212; I&#8217;ll focus the rest of this overview on responses and analysis. In early 2023, Lawfare <a href="https://www.lawfaremedia.org/article/eus-ai-act-barreling-toward-ai-standards-do-not-exist">raised the concern</a> that, given our current understanding of AI, the Act&#8217;s standards may not be technically feasible. In particular, their article raised the salient point that it&#8217;s nearly impossible to anticipate most of a neural network&#8217;s potential failure modes, such as producing dangerous content or identifying and learning irrelevant patterns in data &#8212; this unpredictability makes it difficult to test whether systems would adhere to principles like the EU&#8217;s. There is work in this direction &#8212; Anthropic and OpenAI both do red-teaming, and the recent <a href="https://arxiv.org/abs/2405.06624">Guaranteed Safe AI</a> paper is part of a larger effort to develop systems whose safety and reliability can be formally verified, but it remains unclear whether we could construct standards around these approaches and whether approaches like complete formal verification for safety could, even in principle, work.&nbsp;</p><p>Unsurprisingly, the act has myriad implications. An article from the <a href="https://news.law.fordham.edu/jcfl/2024/04/23/the-first-of-its-kind-the-eu-ai-act-and-what-it-means-for-the-future-of-ai/">Fordham Journal of Corporate and Financial Law</a> covers criticisms and backlash against the Act: an open letter raised the concern that heavy-handed regulation of AI systems in the EU could make compliance costs high and impede competitiveness, driving companies to leave the EU. Indeed, the regulations have prevented companies from engaging with the EU&nbsp;&#8212; Meta has said it <a href="https://www.theverge.com/2024/7/18/24201041/meta-multimodal-llama-ai-model-launch-eu-regulations">won&#8217;t launch its upcoming multimodal Llama model</a> due to unpredictability in the European regulatory environment.&nbsp;</p><p>While we know the general shape and thrust of the regulations, much remains unclear. Those few applications classified as high-risk will need to meet certain obligations such as a pre-market conformity assessment, while GPAI developers will face light transparency requirements. But, full compliance measures for GPAI developers are still under discussion &#8212; the EU&#8217;s AI Office has <a href="https://techcrunch.com/2024/07/30/eu-calls-for-help-with-shaping-rules-for-general-purpose-ais/">called for help</a> with shaping rules for these systems, and companies like OpenAI <a href="https://openai.com/global-affairs/a-primer-on-the-eu-ai-act/">will be working closely with the Office</a> over the coming months.&nbsp;</p><h4><strong>Our Take</strong></h4><p>The AI Act is, understandably, imperfect. In fact, it&#8217;s imperfect in ways that could be fixed before applying it. But, we once again have a classic story: the EU is regulating too much, and therefore will hamper its competitiveness. There&#8217;s truth to this, or it wouldn&#8217;t bear repeating. But, let&#8217;s remember that only &#8220;high-risk&#8221; systems will be carefully regulated &#8212;&nbsp;this includes components and products already subject to existing safety standards (such as medical devices), or systems used for a sensitive purpose such as biometrics, education, or law enforcement.&nbsp;</p><p>The question, then, is whether rolling out regulations for systems with these use cases is jumping the gun. By nature, any regulation placed on an evolving technology like AI will fail to capture all possible use cases and impose unwanted restrictions. Laura Caroli, lead technical negotiator and policy advisor to AI Act co-rapporteur Brando Benifei, <a href="https://iapp.org/news/a/will-the-eu-ai-act-work-lessons-learned-from-past-legislative-initiatives-future-challenges">wrote</a> about some of these challenges and argued that critics underestimate the Act&#8217;s flexibility. She also argues that comparisons to the GDPR don&#8217;t hold up, since under the AI Act, market surveillance authorities intervene where infringements of the law take place, as opposed to where a provider is established &#8212; this addresses that risk that a single authority is overwhelmed by enforcement cases.&nbsp;</p><p>She does concede that the Act&#8217;s emphasis on the intended purpose of a narrow high-risk AI system is a major flaw that could render most of the regulation impossible to implement &#8212; but, as she says, it&#8217;s only possible to verify the Act&#8217;s feasibility in a few years, and the current version of the Act serves as a basis that can be built on in the future. I tend to agree that the Act is very much an experiment in governance that we will and should learn from &#8212;&nbsp;but regardless of its intentions, the <em>perception</em> of the Act and the responses to that perception matter. The EU has to play a game of not only developing good policy, but convincing the world and would-be EU-based founders that they support the innovation they&#8217;re so often accused of stymieing.&nbsp;</p><p>&#8212;Daniel</p><h2><strong>Research Highlight: </strong><a href="https://arxiv.org/abs/2408.02666">Self-Taught Evaluators</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V9UD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V9UD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!V9UD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!V9UD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!V9UD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V9UD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg" width="402" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:402,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V9UD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!V9UD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!V9UD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!V9UD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e007f6-814b-4eda-86ae-9ae32d2082ad_1024x1024.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>This paper seeks to further reduce the pesky bottleneck that is the need for human-labeled data to improve language models. It presents a method for improving language models&#8217; abilities to judge (i.e. label) prompt-response pairs without needing any human annotations (as is typically the case for <a href="https://arxiv.org/abs/1706.03741">RLHF</a>). Using this Self-Taught Evaluator method, the authors are able to improve a Llama3-70B-Instruct model from 75.4 to 88.3 on <a href="https://arxiv.org/abs/2403.13787">RewardBench</a>&#8211;a recently published benchmark specifically for LLM reward models. The Self-Taught Evaluator model outperforms commonly used LLM judges like GPT-4 and matches top reward models trained with human-labeled examples. The method demonstrates the potential for creating strong LLM evaluators which can then be used as part of a larger RLHF or RLAI pipeline.</p><h4><strong>Overview</strong>&nbsp;</h4><p>The crux of RLHF is training a reward model that emulates human judgment. Traditionally (by which I mean <a href="https://arxiv.org/abs/2203.02155">since 2022</a>) reward models for evaluating LLMs require human preference judgments, which can be costly, time-consuming, and become outdated as models improve. The authors of this paper propose an iterative self-improvement scheme that instead only requires human-generated prompts.</p><p>Their method starts with these human-generated prompts and a seed LLM. For each prompt it generates a contrasting pair of model outputs, designed such that one response is likely inferior to the other. Rather than just asking for a good response and a worse response they first generate a modified version of the prompt that is &#8220;highly relevant but not semantically identical&#8221;. The model&#8217;s response to this relevant but not-quite-the-same prompt is the &#8220;bad response&#8221;, while the response to the real prompt is the &#8220;good response&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S3ig!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e42436b-42ac-4925-9117-44bf52fd8065_1600x537.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S3ig!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e42436b-42ac-4925-9117-44bf52fd8065_1600x537.png 424w, https://substackcdn.com/image/fetch/$s_!S3ig!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e42436b-42ac-4925-9117-44bf52fd8065_1600x537.png 848w, https://substackcdn.com/image/fetch/$s_!S3ig!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e42436b-42ac-4925-9117-44bf52fd8065_1600x537.png 1272w, https://substackcdn.com/image/fetch/$s_!S3ig!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e42436b-42ac-4925-9117-44bf52fd8065_1600x537.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S3ig!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e42436b-42ac-4925-9117-44bf52fd8065_1600x537.png" width="1456" height="489" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e42436b-42ac-4925-9117-44bf52fd8065_1600x537.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:489,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!S3ig!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e42436b-42ac-4925-9117-44bf52fd8065_1600x537.png 424w, https://substackcdn.com/image/fetch/$s_!S3ig!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e42436b-42ac-4925-9117-44bf52fd8065_1600x537.png 848w, https://substackcdn.com/image/fetch/$s_!S3ig!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e42436b-42ac-4925-9117-44bf52fd8065_1600x537.png 1272w, https://substackcdn.com/image/fetch/$s_!S3ig!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e42436b-42ac-4925-9117-44bf52fd8065_1600x537.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Next, using the model as an <a href="https://arxiv.org/html/2306.05685v4">LLM-as-a-Judge</a>, they generate binary judgments and reasoning traces about these judgments for each synthetic pair of responses. The judgments can be labeled as correct or incorrect based on the synthetic preference pair design. The reasoning traces and judgements are filtered for the cases when the judgment is correct (i.e. the model prefers the response to the real prompt rather than the response to the slightly off prompt). Then this dataset consisting of tuples of the form (prompt, good response, bad response, judgment, reasoning for judgment) is used to do supervised finetuning of the LLM-as-a-Judge. The process can then be iterated for further self-improvement.</p><p>In their experiments, the authors used Llama3-70B-Instruct as the initial seed model. Through their iterative training process, they improved its performance on RewardBench from 75.4 to 88.3 (or 88.7 with majority voting). This result outperforms commonly used LLM judges such as GPT-4 and matches the performance of top-performing reward models trained with labeled examples. The paper also presents various ablation studies and analyses, including comparisons with models trained on human-annotated data, experiments with different data sources, and investigations into the impact of instruction complexity and data mixing strategies.</p><p>The Self-Taught Evaluator approach demonstrates the potential for creating strong evaluation models using only synthetic data. This method addresses the need for expensive human annotations and the problem of data becoming outdated as models improve. The authors suggest that their technique could be valuable for scaling to new tasks or evaluation criteria and could potentially empower the entire workflow of LLM development, including training, iterative improvement, and evaluation. However, they also note some limitations, such as the higher inference cost of generative LLM-as-a-Judge models compared to simple classifier-based reward models.</p><h4><strong>Our Take</strong></h4><p>Evaluating natural language generation is hard, even for humans. The history of automating evaluation of NLG is somewhat fraught: first there were metrics designed to measure how close some generated text was to some reference text&#8211;<a href="https://aclanthology.org/W04-1013.pdf">ROUGE</a> for summarization, <a href="https://dl.acm.org/doi/10.3115/1073083.1073135">BLEU</a> for translation, and more recently <a href="https://arxiv.org/abs/2102.01454">MAUVE</a> for open-ended generation. But metrics lack the endless nuance of language and using them as rewards in an RL system didn&#8217;t work too well: optimizing for <a href="https://arxiv.org/pdf/1609.08144">either BLEU</a> or <a href="https://arxiv.org/abs/1909.01214">ROUGE</a> resulted in models whose outputs got really good ROUGE and BLEU scores but didn&#8217;t read well to actual humans.</p><p></p><p>Of course If you&#8217;ve built a system to process natural language at scale and you have any faith in your system, the natural thing to do is to use your system for processing natural language to automate evaluation. And indeed RLHF and RLAI have been massively successful for creating chatbots that are helpful. Training on large amounts of synthetic data is the norm for state-of-the-art language models these days. The always well-informed Nathan Lambert <a href="https://www.interconnects.ai/p/frontier-model-post-training">reports</a> &#8220;rumors that OpenAI is training its next generation of models on 50 trillion tokens of largely synthetic data&#8221;. Likewise Google DeepMind&#8217;s <a href="https://arxiv.org/html/2404.07503v1">report on synthetic data for language models</a> from April of this year is largely bullish about the trend noting that &#8220;As we approach human-level or even superhuman-level intelligence, obtaining synthetic data becomes even more crucial, given that models need better-than-average-human quality data to progress.&#8221;</p><p>And yet synthetic language data, particularly when its justification is &#8220;number go up on benchmark&#8221;, leaves a bad taste in my mouth. LLMs are not humans and LLM-generated text is not a perfect substitute for human-generated text. RLHF has been show to <a href="https://arxiv.org/abs/2310.13548">reward sycophancy</a> and <a href="https://arxiv.org/pdf/2310.06452">reduce the diversity of model outputs</a> and I&#8217;m kept up at night by the thought that methods like Self-Taught Evaluators, which further remove human guidance/judgment from the training objective, could be having negative effects on outputs in ways that aren&#8217;t captured by existing benchmarks.</p><p>&#8211;Cole</p><h2>New from the Gradient</h2><h3><a href="https://thegradientpub.substack.com/p/pete-wolfendale-the-revenge-of-reason">Pete Wolfendale: The Revenge of Reason</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!STTG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc5a5f3-db40-437f-85a0-b0e5616e6897_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!STTG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc5a5f3-db40-437f-85a0-b0e5616e6897_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!STTG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc5a5f3-db40-437f-85a0-b0e5616e6897_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!STTG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc5a5f3-db40-437f-85a0-b0e5616e6897_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!STTG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc5a5f3-db40-437f-85a0-b0e5616e6897_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!STTG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc5a5f3-db40-437f-85a0-b0e5616e6897_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6cc5a5f3-db40-437f-85a0-b0e5616e6897_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!STTG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc5a5f3-db40-437f-85a0-b0e5616e6897_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!STTG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc5a5f3-db40-437f-85a0-b0e5616e6897_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!STTG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc5a5f3-db40-437f-85a0-b0e5616e6897_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!STTG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc5a5f3-db40-437f-85a0-b0e5616e6897_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/pete-wolfendale-the-revenge-of-reason&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/pete-wolfendale-the-revenge-of-reason"><span>Listen</span></a></p><h3><a href="https://thegradient.pub/we-need-positive-visions-for-ai-grounded-in-wellbeing/">We Need Positive Visions for AI Grounded in Wellbeing</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Iyqc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Iyqc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Iyqc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp" width="1456" height="832" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Iyqc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!Iyqc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd73087d3-5335-4688-87da-aca1ae575268_1792x1024.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradient.pub/we-need-positive-visions-for-ai-grounded-in-wellbeing/&quot;,&quot;text&quot;:&quot;Read&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradient.pub/we-need-positive-visions-for-ai-grounded-in-wellbeing/"><span>Read</span></a></p><h3><a href="https://thegradientpub.substack.com/p/peter-lee-computing-theory-compilers-gpt-4">Peter Lee: Computing Theory and Practice, and GPT-4&#8217;s Impact</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mODr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38e36f9c-2439-4794-848b-97eb450fa412_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mODr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38e36f9c-2439-4794-848b-97eb450fa412_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!mODr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38e36f9c-2439-4794-848b-97eb450fa412_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!mODr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38e36f9c-2439-4794-848b-97eb450fa412_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!mODr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38e36f9c-2439-4794-848b-97eb450fa412_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mODr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38e36f9c-2439-4794-848b-97eb450fa412_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/38e36f9c-2439-4794-848b-97eb450fa412_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mODr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38e36f9c-2439-4794-848b-97eb450fa412_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!mODr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38e36f9c-2439-4794-848b-97eb450fa412_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!mODr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38e36f9c-2439-4794-848b-97eb450fa412_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!mODr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38e36f9c-2439-4794-848b-97eb450fa412_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/peter-lee-computing-theory-compilers-gpt-4&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/peter-lee-computing-theory-compilers-gpt-4"><span>Listen</span></a></p><h2>Other Things That Caught Our Eyes</h2><h3>News</h3><p><strong><a href="https://www.polygon.com/24210468/college-football-25-ai-machine-learning-ea-sports">College Football 25 wouldn&#8217;t have been possible without AI, EA boss says</a></strong></p><p>EA Sports' College Football 25, the latest installment in the beloved sports simulation franchise, was made possible by the use of AI and machine learning technology, according to Electronic Arts CEO Andrew Wilson. The game features 150 unique stadiums and over 11,000 player likenesses, which would not have been achievable without the power of AI. While the player likenesses were enhanced by talented artists, the algorithmic work provided a foundation for their creation. Wilson emphasized that AI was crucial in delivering the level of gameplay and visual fidelity seen in College Football 25. This reliance on AI and machine learning will also extend to the upcoming EA Sports FC 25, the successor to the FIFA franchise. The use of AI technology will drive enhanced tactical sophistication and realism in gameplay. The availability of College Football 25 is currently limited to PlayStation 5 and Xbox Series X.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/07/31/this-week-in-ai-companies-are-growing-skeptical-of-ais-roi/">This Week in AI: Companies are growing skeptical of AI&#8217;s ROI</a></strong></p><p>A recent report by Gartner suggests that around one-third of generative AI projects in the enterprise will be abandoned after the proof-of-concept phase by the end of 2025. The main barrier to adoption is the unclear business value of generative AI. The cost of implementing generative AI organization-wide can range from $5 million to $20 million, making it difficult for companies to justify the investment when the benefits are hard to quantify and may take years to materialize. Additionally, a survey by Upwork found that AI tools have actually decreased productivity and added to the workload of workers. This growing skepticism towards AI's return on investment indicates that companies are expecting more from AI and vendors need to manage expectations.&nbsp;</p><p><strong><a href="https://www.washingtonpost.com/opinions/2024/07/31/google-gemini-ai-dear-sydney-ad-olympics-satire/">Opinion | I hate the Gemini &#8216;Dear Sydney&#8217; ad more every passing moment</a></strong></p><p>The article criticizes the Gemini "Dear Sydney" ad for Google's AI product. The ad features a little girl who wants to write a letter to her idol, Olympic hurdler Sydney McLaughlin-Levrone. Instead of letting his daughter write the letter herself, the girl's dad asks Gemini AI to write it on her behalf. The author argues that this ad misses the point of writing, which is to express one's own thoughts and ideas. They believe that relying on AI to write for us takes away our ability to think for ourselves. The article criticizes the idea of replacing human creativity and personal connection with AI-generated content.&nbsp;</p><p><strong><a href="https://www.theverge.com/2024/8/5/24213861/apple-intelligence-instructions-macos-15-1-sequoia-beta">&#8216;You are a helpful mail assistant,&#8217; and other Apple Intelligence instructions</a></strong></p><p>Apple's latest developer betas include generative AI features that will be available on iPhones, iPads, and Macs in the coming months. These features are supported by a model that contains backend prompts, which provide instructions to the AI tools. The prompts are visible on Apple computers and give insight into how the features work. For example, there are prompts for a "helpful mail assistant" that instruct the AI bot on how to ask questions based on the content of an email. Another prompt is for the "Rewrite" feature, which provides instructions on limiting answers to 50 words and avoiding the creation of fictional information. The files also contain instructions for generating "Memories" videos with Apple Photos, with guidelines to avoid religious, political, harmful, or negative content. These prompts give users a glimpse into the inner workings of Apple's AI tools.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/08/06/waymo-expands-robotaxi-coverage-in-los-angeles-and-san-francisco/">Waymo expands robotaxi coverage in Los Angeles and San Francisco</a></strong></p><p>Waymo, the self-driving car company owned by Alphabet, is expanding its robotaxi service area in Los Angeles and San Francisco. In San Francisco, Waymo is adding 10 square miles to its service area, including Daly City, Broadmoor, and Colma. In Los Angeles, it is adding 16 square miles, including Marina del Rey, Mar Vista, and Playa Vista. Waymo has been working to scale its commercial operations and increase its customer base. It has provided over two million paid trips to riders across all Waymo One markets and serves over 50,000 paid trips each week. Waymo's operations are still supported by Alphabet, which recently announced a $5 billion investment. The expansion comes after receiving approval from the California Public Utilities Commission. Waymo currently has about 300 vehicles in San Francisco, 50 in Los Angeles, and 200 in Phoenix. The company has seen increased demand in Los Angeles, with over 150,000 people signing up for the waitlist.&nbsp;</p><p><strong><a href="https://www.businessinsider.com/homeowners-insurance-nightmare-cancellation-surveillance-drone-ai-future-2024-8">Your Next Home Insurance Nightmare: AI, Drones, and Surveillance</a></strong></p><p>The article discusses the use of AI-powered aerial surveillance by insurance companies, specifically focusing on the case of Travelers. The author, who is a privacy advocate, shares their personal experience of having their homeowner's insurance policy revoked due to AI surveillance detecting moss on their roof. The author raises concerns about the lack of transparency and accountability in the use of AI surveillance by insurance companies, highlighting the potential for unnecessary home repairs and the pressure on homeowners. The article emphasizes the need for updated laws and regulations to protect consumers from the risks of AI surveillance.&nbsp;</p><p><strong><a href="https://theconversation.com/robocars-promise-to-improve-traffic-even-when-most-of-the-cars-around-them-are-driven-by-people-study-finds-233546">Robocars promise to improve traffic even when most of the cars around them are driven by people, study finds</a></strong></p><p>Researchers have found that robotic vehicles can improve traffic flow in cities even when mixed with vehicles driven by humans. Using reinforcement learning, the researchers developed algorithms that allow robot vehicles to optimize traffic flow by communicating with each other. The experiments showed that when robot vehicles made up just 5% of traffic, traffic jams were eliminated, and when they made up 60% of traffic, traffic efficiency was superior to traffic controlled by traffic lights. This research is significant because it offers a potential solution to the worsening traffic problem in cities, without requiring all vehicles to be autonomous. The researchers plan to expand their framework and test it under real-world conditions.</p><p><strong><a href="https://www.cnbc.com/2024/08/02/uk-cancels-1point3-billion-of-tech-and-ai-infrastructure-projects.html">Britain cancels $1.7 billion of computing projects in setback for global AI ambitions</a></strong></p><p>The U.K. government has canceled &#163;1.3 billion ($1.7 billion) worth of computing infrastructure projects, including a &#163;500 million pledge for the AI Research Resource and an &#163;800 million commitment for a next-generation exascale computer. These projects were aimed at bolstering the U.K.'s compute infrastructure and its ability to run advanced AI models. The cancellation is a setback for the country's ambitions to become a world leader in artificial intelligence. The government cited the need to prioritize other fiscal plans and address unfunded commitments. The Labour government is now looking to bring in new statutory regulations for the AI industry.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/08/06/figures-new-humanoid-robot-leverages-openai-for-natural-speech-conversations/">Figure&#8217;s new humanoid robot leverages OpenAI for natural speech conversations</a></strong></p><p>Figure has unveiled its latest humanoid robot, the Figure 02, which leverages OpenAI for natural speech conversations. The robot is equipped with speakers and microphones to communicate with humans in the workplace. The use of natural language capabilities in humanoid robots adds transparency and allows for better instruction and safety. Figure's partnership with OpenAI, which helped raise a $675 million Series B funding round, highlights the importance of neural networks in the robotics industry. The Figure 02 also features a ground-up hardware and software redesign, including improved hands, visual language models, and six RGB cameras. The company hints at a future expansion beyond the warehouse/factory floor to commercial and home applications.</p><p><strong><a href="https://www.cpr.org/2024/08/06/colorado-schools-ai-roadmap-guide-students-teachers/">Colorado schools have AI roadmap to guide students and teachers into brave new world</a></strong></p><p>The Colorado Education Initiative has released a roadmap to guide the integration of artificial intelligence (AI) into education policy and curriculums. The roadmap focuses on three main areas: teaching and learning, advancing equity, and developing policy for transparent and ethical use. It emphasizes the importance of students understanding concepts like AI hallucinations, data privacy, and potential bias in AI. The roadmap provides examples of how AI can support students, such as tailoring curriculum to match their learning pace and serving as a real-time tutor. For educators, AI has the potential to increase the amount of time available to work directly with students and reduce time spent on administrative work. The roadmap also highlights the need to engage all students, including English language learners and rural families, in learning about AI and recommends conducting audits to identify disparities in access to classroom technology. The roadmap encourages fluid guidance over rigid policies due to the rapidly changing nature of AI and emphasizes the importance of transparency and ethical use of data. The next steps include creating a task force on AI and updating conduct and discipline policies. The roadmap will be revised based on the learnings from a pilot program called Elevate AI.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/08/06/reddit-ai-powered-search-results/">Reddit to test AI-powered search result pages</a></strong></p><p>Reddit plans to test AI-powered search result pages that will provide AI-generated summaries and content recommendations at the top of search results. The goal is to help users explore content and discover new Reddit communities. The company will use a combination of first-party and third-party technology to power this feature, and the experiment will begin later this year. This initiative aligns with Reddit's partnership with OpenAI, which allows them to leverage OpenAI's language models and build AI-powered features. During the earnings call, Reddit's CEO also highlighted the success of their AI-powered language translation feature and the company's growth in user base and revenue.&nbsp;</p><p><strong><a href="https://www.thewrap.com/ai-startup-prorata-partnerships-atlantic-universal-music-group/">AI Startup ProRata Inks Major Media Partnerships With Promise of Compensation and Accurate Attribution</a></strong></p><p>ProRata.ai, a generative AI startup, has secured licensing deals with major media and music companies, including the Financial Times, Axel Springer, The Atlantic, Fortune, and Universal Music Group. The company promises accurate attribution to source material and a revenue share plan on a per-use basis with publishers. ProRata aims to reframe the industry standard by analyzing AI output and content value to determine proportional compensation for publishers. The company also plans to launch its own AI chatbot. The startup raised $25 million in a Series A fundraising round and will be led by tech entrepreneur Bill Gross.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/08/07/audible-ai-powered-search-feature/">Audible is testing an AI-powered search feature</a></strong></p><p>Audible, the audiobook company owned by Amazon, is testing an AI-powered search feature called Maven. Maven is a personal recommendation expert that uses natural language processing to provide tailored audiobook recommendations based on users' specific requests. The feature is currently available to select U.S. customers on iOS and Android devices and is limited to a subset of Audible's catalog. Audible is also experimenting with AI-curated collections and AI-generated review summaries. The company did not specify which AI models are powering Maven but stated that it will continuously evaluate and enhance the feature. This announcement comes after the backlash from creatives over the increasing use of AI-voiced audiobooks.&nbsp;</p><h3>Papers</h3><ul><li><p><a href="https://www.nature.com/articles/s41592-024-02367-7.epdf?sharing_token=qRyiUCqCCv4loHcH0-5NiNRgN0jAjWel9jnR3ZoTv0MRuYCVgF3yxI1jcmBSpQopnCjGCItSc2-PYYKR3bpopa5pd6IroFrIn3LHVANxeDRhRGLbXtILWYHSSd727fWZG1pqWruEql2FzEZo8XBl-5uTJyFZiwBvRsQ-amlsiTY%3D">Toward learning a foundational representation of cells and genes</a></p></li><li><p><a href="https://arxiv.org/abs/2408.03900">Speech-MASSVE: A Multilingual Speech Dataset for SLU and Beyond</a></p></li><li><p><a href="https://arxiv.org/abs/2408.03906">Achieving Human Level Competitive Robot Table Tennis</a></p></li><li><p><a href="https://arxiv.org/abs/2408.03314">Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters</a></p></li><li><p><a href="https://arxiv.org/abs/2408.02442">Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of LLMs</a></p></li><li><p><a href="https://www.arxiv.org/abs/2408.02752">Diffusion Models as Data Mining Tools</a></p></li><li><p><a href="https://arxiv.org/abs/2408.03325">CoverBench: A Challenging Benchmark for Complex Claim Verification</a></p></li><li><p><a href="https://www.arxiv.org/abs/2408.02900">MedTrinity25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine</a></p></li><li><p><a href="https://www.arxiv.org/abs/2408.02718">MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models</a></p></li><li><p><a href="https://www.arxiv.org/abs/2408.02226">ProCreate, Don&#8217;t Reproduce! Propulsive Energy Diffusion for Creative Generation</a></p></li><li><p><a href="https://arxiv.org/abs/2408.01800">MiniCPM-V: A GPT-4V Level MLLM on Your Phone</a></p></li><li><p><a href="https://arxiv.org/abs/2408.02629">VidGen-1M: A Large-Scale Dataset for Text-to-video Generation</a></p></li><li><p><a href="https://arxiv.org/abs/2408.01584">GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS</a></p></li><li><p><a href="https://ai.meta.com/sam2/">Segment Anything Model 2</a></p></li></ul><h3>Closing Thoughts</h3><p>Have something to say about this edition&#8217;s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at <a href="mailto:dbashir@hmc.edu">dbashir@hmc.edu</a> or on <a href="https://twitter.com/spaniel_bashir">Twitter</a>. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!</p>]]></content:encoded></item><item><title><![CDATA[Mini-Update #45: OpenAI Security Breach and Accessible Diffusion Models]]></title><description><![CDATA[AI adoption struggles in enterprises and Meta's Segment Anything Model 2.]]></description><link>https://thegradientpub.substack.com/p/mini-update-45-openai-security-breach</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/mini-update-45-openai-security-breach</guid><dc:creator><![CDATA[Ather Fawaz]]></dc:creator><pubDate>Thu, 08 Aug 2024 07:20:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Hy-g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0abea2-df69-48c8-b52b-4215a619b8ea_1320x896.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 45th Mini-Update from the Gradient! This is our exclusive newsletter edition specifically for paying subscribers, and it is our way of showing you our appreciation for your support.</p><h1><strong><a href="https://techcrunch.com/2024/07/31/this-week-in-ai-companies-are-growing-skeptical-of-ais-roi/?guccounter=2">New&#8230;</a></strong></h1>
      <p>
          <a href="https://thegradientpub.substack.com/p/mini-update-45-openai-security-breach">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Update #80: Kamala Harris's AI Policy and End-To-End Causal Effect Estimation from Unstructured Natural Language Data]]></title><description><![CDATA[We look at Harris's stances on AI regulation; researchers introduce the NATURAL framework for causal effect estimation.]]></description><link>https://thegradientpub.substack.com/p/update-80-kamala-harriss-ai-policy</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/update-80-kamala-harriss-ai-policy</guid><dc:creator><![CDATA[Justin Landay]]></dc:creator><pubDate>Tue, 30 Jul 2024 15:30:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!O8M6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 80th update from the Gradient! If you&#8217;re new and like what you see, <a href="https://thegradientpub.substack.com/">subscribe</a> and follow us on <a href="https://twitter.com/gradientpub">Twitter</a>. <strong>Our newsletters run long, so you&#8217;ll need to view this post on Substack to see everything!</strong></p><p>As always, if you want to write with us, send a pitch using <a href="https://goo.gl/forms/whYRKEzMZJox6FaH2">this form</a>.</p><h2><strong>News Highlight</strong>: <a href="https://techcrunch.com/2024/07/21/what-kamala-harris-has-said-about-ai-tech-regulation-and-more/">Kamala Harris and the Future of AI Policy</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O8M6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O8M6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png 424w, https://substackcdn.com/image/fetch/$s_!O8M6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png 848w, https://substackcdn.com/image/fetch/$s_!O8M6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png 1272w, https://substackcdn.com/image/fetch/$s_!O8M6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O8M6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png" width="700" height="348.5576923076923" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:725,&quot;width&quot;:1456,&quot;resizeWidth&quot;:700,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O8M6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png 424w, https://substackcdn.com/image/fetch/$s_!O8M6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png 848w, https://substackcdn.com/image/fetch/$s_!O8M6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png 1272w, https://substackcdn.com/image/fetch/$s_!O8M6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F988b9c69-fcb8-48a3-be2d-0aa199b27857_1466x730.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Summary</strong></h4><p>Kamala Harris has played a key role in the Biden administration&#8217;s efforts around artificial intelligence, taking the lead in negotiating safety standards with major tech firms and pushing for federal regulations to mitigate AI's potential harms. As the likely Democratic presidential nominee, her commitment to AI regulation is a crucial part of her policy agenda. So what does Harris mean for the future of AI regulation and governance?&nbsp;</p><p><strong>Overview</strong></p><p>Recently, President Joe Biden announced that he will not seek reelection, endorsing Vice President Kamala Harris as the Democratic nominee. While Harris has expressed her determination to secure the nomination, it remains uncertain if she will face competition from other Democrats.</p><p>Over the past three years, Vice President Kamala Harris has taken a leading role inside the White House on artificial intelligence (<a href="https://www.nytimes.com/2024/07/24/technology/kamala-harris-ai-regulation.html">source</a>). Specifically, Harris organized key meetings with leaders from top tech firms such as OpenAI, Anthropic and Google, securing agreements on voluntary safety measures (<a href="https://www.nytimes.com/2024/07/24/technology/kamala-harris-ai-regulation.html">source</a>). She has spoken against the false dichotomy of choosing between public protection and technological advancement. Harris has argued that without strong government oversight, the tech industry might prioritize profits over public welfare, making it important for the need for voluntary commitments from companies as a step towards a safer AI future (<a href="https://www.bloomberg.com/news/newsletters/2024-07-25/what-kamala-harris-means-for-the-future-of-ai-policy">source</a>). While she advocated for Congress to pass regulations to safeguard against AI-related job losses and other potential damages, significant legislative progress and corporate compliance have been limited (<a href="https://www.nytimes.com/2020/08/20/technology/kamala-harris-ties-to-big-tech.html">source</a>).&nbsp;</p><p>At the UK&#8217;s AI Safety Summit, held in November last year, Kamala Harris &#8216;really brought it all back down to earth&#8217; to quote Verity Harding, former global head of public policy at Google DeepMind (<a href="https://www.bloomberg.com/news/newsletters/2024-07-25/what-kamala-harris-means-for-the-future-of-ai-policy">source</a>). Discussions at the summit were focused on the potential risk of a hypothetical &#8216;runaway AI&#8217; causing global harm. Harris steered the conversation towards the need of the hour &#8211; AI policy to protect the common public, pointing out the damage biases and errors in AI tools could have on marginalized communities (<a href="https://www.bloomberg.com/news/newsletters/2024-07-25/what-kamala-harris-means-for-the-future-of-ai-policy">source</a>). Despite the influential debates about AI regulation, Harris&#8217; entrance into the presidential election has received a mixed reaction regarding AI regulation. Activists focused on AI policy might feel energized by her candidacy, seeing it as an opportunity to push for more stringent regulations. Meanwhile, some people within the tech industry and others might interpret her candidacy as a sign that the current, relatively lenient regulatory environment for AI companies in the U.S. could continue (<a href="https://www.nytimes.com/2020/08/20/technology/kamala-harris-ties-to-big-tech.html">source</a>).&nbsp;</p><p>Additionally, Harris has a long-standing connection with the tech industry from her days as San Francisco&#8217;s district attorney and California&#8217;s attorney general, with backing from influential Silicon Valley figures. Her early supporters include the likes of famous VCs John Doerr and Ron Conway. As a presidential candidate, she was promptly backed by LinkedIn co-founder Reid Hoffman (<a href="https://techcrunch.com/2024/07/21/what-kamala-harris-has-said-about-ai-tech-regulation-and-more/">source</a>). During her 2010 campaign for California Attorney General, Kamala Harris participated in a Q&amp;A session at Google's headquarters in Mountain View and emphasized the critical role of the tech industry's expertise in enhancing government communication and modernizing its systems (<a href="https://www.nytimes.com/2024/07/24/technology/kamala-harris-ai-regulation.html">source</a>). While she has faced criticism for not being tougher on tech giants during her tenure as attorney general, she has consistently called for more regulation in the sector to protect consumer interests.</p><p>On other tech-related issues, Harris has addressed concerns about TikTok's ownership related to national security, indicating no intention to ban the app but highlighting the need for action regarding its management. Her stance on cryptocurrency has been less pronounced, but she is expected to support the Biden administration's regulatory approach in this area as well.</p><h4><strong>Our Take</strong></h4><p>As far as AI regulation is concerned, Kamala Harris has managed to set herself apart from the likes of Donald Trump, who has said little about artificial intelligence. Hailing from the Bay Area, Silicon Valley is not an unknown territory for her, and she wisely used this fact to impress AI policy activists. So far, she has done a good job of balancing between being too vocal and being too defensive. However, she has not taken any active action on AI either. Her only major prior involvement has been as California&#8217;s attorney general, where she initiated a case against a major porn site operator and negotiated an agreement with leading tech companies to enhance user privacy protections. It remains to be seen whether the support and appreciation coming from AI policy activists is an outcome of just a new player coming into the mix or Harris&#8217; skill and talent. The true measure of the upcoming policies, as always, will be their impact on working-class families.</p><p>-- Sharut</p><h2><strong>Research Highlight: </strong><a href="https://arxiv.org/html/2407.07018v1#S5">End-To-End Causal Effect Estimation from Unstructured Natural Language Data</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aUTM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76eed330-a9df-46c8-bc65-71d82a936f54_860x1274.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aUTM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76eed330-a9df-46c8-bc65-71d82a936f54_860x1274.jpeg 424w, https://substackcdn.com/image/fetch/$s_!aUTM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76eed330-a9df-46c8-bc65-71d82a936f54_860x1274.jpeg 848w, https://substackcdn.com/image/fetch/$s_!aUTM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76eed330-a9df-46c8-bc65-71d82a936f54_860x1274.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!aUTM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76eed330-a9df-46c8-bc65-71d82a936f54_860x1274.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aUTM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76eed330-a9df-46c8-bc65-71d82a936f54_860x1274.jpeg" width="260" height="385.16279069767444" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76eed330-a9df-46c8-bc65-71d82a936f54_860x1274.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1274,&quot;width&quot;:860,&quot;resizeWidth&quot;:260,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aUTM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76eed330-a9df-46c8-bc65-71d82a936f54_860x1274.jpeg 424w, https://substackcdn.com/image/fetch/$s_!aUTM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76eed330-a9df-46c8-bc65-71d82a936f54_860x1274.jpeg 848w, https://substackcdn.com/image/fetch/$s_!aUTM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76eed330-a9df-46c8-bc65-71d82a936f54_860x1274.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!aUTM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76eed330-a9df-46c8-bc65-71d82a936f54_860x1274.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>Estimating causal effects is one of the most common and difficult tasks for scientists and analysts. The gold standard for causal effect estimation relies on running randomized control experiments which are extremely time and cost prohibitive. Additionally, there are other types of causal effects that one may be interested in measuring but would be unable to experimentally test such as whether or not cigarettes cause lung cancer (imagine assigning treatment and control groups here). Recently, researchers from the Vector Institute at the University of Toronto and Meta published a <a href="https://arxiv.org/pdf/2407.07018">paper</a> introducing the NATURAL framework for causal effect estimation. NATURAL, ties together many large language models (LLMs) to mine large volumes of unstructured text (reddit posts!) to model conditional distributions which are then used as inputs for classical casual effect estimators. They demonstrated the methodology on 2 synthetic and 4 real observational datasets paired with ground truth from real phase 3/4 clinical trials. They found that NATURAL was able to parse through hundreds of thousands of reddit posts discussing various treatments to estimate the experimentally observed causal effect within 3% of the ground truth.</p><h4><strong>Overview</strong>&nbsp;</h4><p>The researchers begin by posing a &#8220;simple&#8221; question:&nbsp; How can we use large language models to automate treatment effect estimation using freely available text data? The researchers decided to explore a pipeline of LLMs to ultimately infer conditional distributions which could then be used by classical estimators for average treatment effects. This pipeline can best be understood via walking through an example research question.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IkrL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaa3d1c-63ad-4fca-afea-2780f5f6ec20_1540x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IkrL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaa3d1c-63ad-4fca-afea-2780f5f6ec20_1540x720.png 424w, https://substackcdn.com/image/fetch/$s_!IkrL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaa3d1c-63ad-4fca-afea-2780f5f6ec20_1540x720.png 848w, https://substackcdn.com/image/fetch/$s_!IkrL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaa3d1c-63ad-4fca-afea-2780f5f6ec20_1540x720.png 1272w, https://substackcdn.com/image/fetch/$s_!IkrL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaa3d1c-63ad-4fca-afea-2780f5f6ec20_1540x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IkrL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaa3d1c-63ad-4fca-afea-2780f5f6ec20_1540x720.png" width="618" height="289.0508241758242" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cbaa3d1c-63ad-4fca-afea-2780f5f6ec20_1540x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:681,&quot;width&quot;:1456,&quot;resizeWidth&quot;:618,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IkrL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaa3d1c-63ad-4fca-afea-2780f5f6ec20_1540x720.png 424w, https://substackcdn.com/image/fetch/$s_!IkrL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaa3d1c-63ad-4fca-afea-2780f5f6ec20_1540x720.png 848w, https://substackcdn.com/image/fetch/$s_!IkrL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaa3d1c-63ad-4fca-afea-2780f5f6ec20_1540x720.png 1272w, https://substackcdn.com/image/fetch/$s_!IkrL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbaa3d1c-63ad-4fca-afea-2780f5f6ec20_1540x720.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The example the researchers walked through was on comparing the treatment effects of two different weight loss drugs; Semaglutide (also sold under Ozempic or Wegovy) vs. Tirzepatide (sold under Zepbound). Their pipeline consisted of the following steps:</p><ol><li><p>Data Gathering</p><ol><li><p>Identified 9 subreddits where users often shared their experiences taking weight loss drugs and the effects associated with them</p></li></ol></li><li><p>Relevance Filtering</p><ol><li><p>Heuristic Filtering - Removed posts that were from &#8220;bot&#8221; accounts, posts that were deleted, posts that did not explicitly match relevant keywords, and posts that were too short</p></li><li><p>Relevancy Filtering - Uses a LLM (which has been prompted with in context examples) to further filter out posts that do not contain information relevant to the study</p></li><li><p>Treatment Outcome Filtering &#8211; Uses an LLM to identify posts that contain all the variables (starting weight, end weight, duration, dosage, etc) for measuring treatment effects. Similarly, LLM extracts the data and exports as a JSON to be used downstream</p></li></ol></li><li><p>Covariate Extracting &amp; Inclusion Criteria</p><ol><li><p>Uses an LLM to extract relevant covariates from the post. Some covariates for this trial included age, gender, starting weight, dosage, and treatment duration</p></li><li><p>Filters samples out with partially or totally incomplete covariate data</p></li><li><p>Filters samples to <em>include</em> samples that match the expected covariates (an example would be dosages are within expected treatment ranges)</p></li></ol></li><li><p>Infer Conditional Distributions</p><ol><li><p>Iterates over all potential treatment outcomes and permutations of covariates to prompt an LLM to estimate the treatment effect for the particular permutation, given all the filtered reports</p></li><li><p>LLAMA2-70B was chosen here for its ability to directly return log-probability</p></li><li><p>Re-normalization of probabilities over all outcomes to convert to a probability distribution&nbsp;</p></li></ol></li></ol><h4><strong>Our Take</strong></h4><p>Without rehashing any of the details from the limitations and impact sections of the paper, there are a few other limitations left out from this section but could be inferred from others. For the six trials examined (4 real, 2 synthetic), NATURAL was able to parse relevant reddit posts to accurately estimate causal effects observed in randomized clinical trials. While that does seem super promising and exciting, according to the authors it took quite a bit of fine tuning before they could match the ground truth. I worry that in real world examples without ground truth, there is little one could do to easily validate the quality of the numerous prompts needed, as well as the context examples and how those choices impact results. We can see this in practice from looking at the below figure from the ablation section.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!56ST!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a7164c4-d82d-4f95-89bd-cc5896747098_1620x490.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!56ST!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a7164c4-d82d-4f95-89bd-cc5896747098_1620x490.png 424w, https://substackcdn.com/image/fetch/$s_!56ST!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a7164c4-d82d-4f95-89bd-cc5896747098_1620x490.png 848w, https://substackcdn.com/image/fetch/$s_!56ST!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a7164c4-d82d-4f95-89bd-cc5896747098_1620x490.png 1272w, https://substackcdn.com/image/fetch/$s_!56ST!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a7164c4-d82d-4f95-89bd-cc5896747098_1620x490.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!56ST!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a7164c4-d82d-4f95-89bd-cc5896747098_1620x490.png" width="646" height="195.21978021978023" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a7164c4-d82d-4f95-89bd-cc5896747098_1620x490.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:440,&quot;width&quot;:1456,&quot;resizeWidth&quot;:646,&quot;bytes&quot;:454047,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!56ST!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a7164c4-d82d-4f95-89bd-cc5896747098_1620x490.png 424w, https://substackcdn.com/image/fetch/$s_!56ST!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a7164c4-d82d-4f95-89bd-cc5896747098_1620x490.png 848w, https://substackcdn.com/image/fetch/$s_!56ST!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a7164c4-d82d-4f95-89bd-cc5896747098_1620x490.png 1272w, https://substackcdn.com/image/fetch/$s_!56ST!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a7164c4-d82d-4f95-89bd-cc5896747098_1620x490.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The authors show the root mean square error (RMSE) between the estimated treatment effect and the observed treatment effect converges to a minimum (for the 70B model) after ~ 1000 reports (2^10). For data practitioners who often lack large labeled datasets, this seems incredibly promising for the low volume. However, we argue this is a little misleading since the 1000+ reports needed were found <em>after</em> filtering from an initial 577K posts. Since the error declines as the number of reports increases and the # of reports generated are conditioned on the quality of the filtering and covariate extraction we can get a good data driven view on the impact that quality (or poor) prompt tuning can have on our results.</p><p>Justin</p><h2>New from the Gradient</h2><h3><a href="https://thegradientpub.substack.com/p/manuel-lenore-blum-conscious-turing-machine-tcs">Manuel and Lenore Blum: The Conscious Turing Machine</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0ED2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41b0264-bac6-41dc-9f9d-3739d2149757_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0ED2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41b0264-bac6-41dc-9f9d-3739d2149757_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!0ED2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41b0264-bac6-41dc-9f9d-3739d2149757_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!0ED2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41b0264-bac6-41dc-9f9d-3739d2149757_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!0ED2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41b0264-bac6-41dc-9f9d-3739d2149757_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0ED2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41b0264-bac6-41dc-9f9d-3739d2149757_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a41b0264-bac6-41dc-9f9d-3739d2149757_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0ED2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41b0264-bac6-41dc-9f9d-3739d2149757_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!0ED2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41b0264-bac6-41dc-9f9d-3739d2149757_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!0ED2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41b0264-bac6-41dc-9f9d-3739d2149757_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!0ED2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41b0264-bac6-41dc-9f9d-3739d2149757_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/manuel-lenore-blum-conscious-turing-machine-tcs&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/manuel-lenore-blum-conscious-turing-machine-tcs"><span>Listen</span></a></p><h3><a href="https://thegradientpub.substack.com/p/kevin-dorst-bayesian-epistemolology-irrational">Kevin Dorst: Against Irrationalist Narratives</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BQRn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e7d5a69-877d-4423-ba20-2e5244f68525_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BQRn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e7d5a69-877d-4423-ba20-2e5244f68525_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!BQRn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e7d5a69-877d-4423-ba20-2e5244f68525_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!BQRn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e7d5a69-877d-4423-ba20-2e5244f68525_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!BQRn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e7d5a69-877d-4423-ba20-2e5244f68525_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BQRn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e7d5a69-877d-4423-ba20-2e5244f68525_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e7d5a69-877d-4423-ba20-2e5244f68525_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BQRn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e7d5a69-877d-4423-ba20-2e5244f68525_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!BQRn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e7d5a69-877d-4423-ba20-2e5244f68525_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!BQRn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e7d5a69-877d-4423-ba20-2e5244f68525_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!BQRn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e7d5a69-877d-4423-ba20-2e5244f68525_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/kevin-dorst-bayesian-epistemolology-irrational&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/kevin-dorst-bayesian-epistemolology-irrational"><span>Listen</span></a></p><h2>Other Things That Caught Our Eyes</h2><h3>News</h3><p><strong><a href="https://www.nytimes.com/interactive/2024/07/18/technology/spain-domestic-violence-viogen-algorithm.html">An Algorithm Told Police She Was Safe. Then Her Husband Killed Her.</a></strong></p><p>Spain has relied on an algorithm called VioG&#233;n to assess the risk of domestic violence victims being abused again and determine the level of protection they need. While the system has helped reduce repeat attacks in domestic violence cases, it has also resulted in victims being attacked again, sometimes with fatal consequences. Spain currently has 92,000 active cases of gender violence victims evaluated by VioG&#233;n, with most classified as facing little risk of being hurt again. However, a significant number of women who were assessed as low or negligible risk have reported being harmed again. At least 247 women have been killed by their current or former partners after being assessed by VioG&#233;n. The algorithm's flaws have raised concerns about the reliance on algorithms in making life or death decisions.&nbsp;</p><p><strong><a href="https://www.theguardian.com/technology/article/2024/jul/20/google-is-the-worlds-biggest-search-engine-broken">&#8216;Google says I&#8217;m a dead physicist&#8217;: is the world&#8217;s biggest search engine broken?</a></strong></p><p>Google's search engine has come under scrutiny recently, with users claiming that it is not working as well as it should. The article explores the history of Google and its rise to dominance in the search market, as well as its influence over politics, social attitudes, and businesses. Critics argue that Google search has deteriorated in quality, citing issues such as spam, SEO practices, and the clutter of information boxes within search results. However, others still find Google search to be effective. The article raises questions about Google's trustworthiness and its ability to prioritize user interests.</p><p><strong><a href="https://www.washingtonpost.com/technology/2024/07/16/trump-ai-executive-order-regulations-military">Trump allies draft AI order to launch &#8216;Manhattan Projects&#8217; for defense</a></strong></p><p>Former President Donald Trump's allies are working on a comprehensive AI executive order that would establish "Manhattan Projects" to develop military technology and review regulations. The order aims to create "industry-led" agencies to evaluate AI models and protect systems from foreign adversaries. This approach differs from the Biden administration's executive order, which focuses on safety testing for AI systems. The GOP has adopted a platform that includes repealing the Biden AI executive order, claiming it hinders innovation. The framework provides insight into potential Republican policies to replace the Biden order. The greater military investment in AI could benefit tech companies like Anduril, Palantir, and Scale, which already have contracts with the Pentagon. The conservative Heritage Foundation is also drafting AI policies as part of Project 2025. Tech executives and investors, including Elon Musk and Bill Ackman, have endorsed Trump, indicating a potential second Trump administration would have a friendlier relationship with the tech industry.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/07/23/ftc-is-investigating-how-companies-are-using-ai-to-base-pricing-on-consumer-behavior/">FTC is investigating how companies are using AI to base pricing on consumer behavior</a></strong></p><p>The Federal Trade Commission (FTC) is investigating how companies are using AI to base pricing on consumer behavior. The agency has ordered eight companies, including Mastercard, JPMorgan Chase, and Accenture, to provide information about their AI-powered "surveillance service pricing" and its impact on privacy, competition, and consumer protection. This practice allows companies to charge different prices to different customers based on factors such as location and personal data. The FTC is concerned that this use of AI and personal data could put people's privacy at risk and wants to shed light on this "shadowy ecosystem of pricing middlemen." The investigation aims to understand the types of surveillance pricing services offered by these companies and how they are impacting consumer pricing.&nbsp;</p><p><strong><a href="https://www.404media.co/google-is-the-only-search-engine-that-works-on-reddit-now-thanks-to-ai-deal/">Google Is the Only Search Engine That Works on Reddit Now Thanks to AI Deal</a></strong></p><p>Google has become the exclusive search engine for Reddit, as other search engines like Bing and DuckDuckGo are no longer able to crawl Reddit and provide up-to-date results. This is due to Reddit's decision to lock down access to its site and prevent scraping for AI training data. Google's near monopoly on search is hindering competition and raises concerns about the quality of search results. This exclusivity is a result of a multi-million dollar deal that allows Google to scrape Reddit for data to train its AI products.&nbsp;</p><p><strong><a href="https://www.datacenterdynamics.com/en/news/openai-training-and-inference-costs-could-reach-7bn-for-2024-ai-startup-set-to-lose-5bn-report/">OpenAI training and inference costs could reach $7bn for 2024, AI startup set to lose $5bn - report</a></strong></p><p>OpenAI is projected to spend nearly $4 billion this year on training and inference costs, with a potential shortfall of $5 billion. The company currently uses Microsoft's servers to run inference workloads for ChatGPT, with around 290,000 servers dedicated to this task. Training ChatGPT and new models could cost up to $3 billion this year. OpenAI benefits from discounted rates from Microsoft Azure, paying about $1.30 per A100 server per hour. The company employs around 1,500 people, which could cost $1.5 billion as it continues to grow. While OpenAI generates about $2 billion annually from ChatGPT, it may need to raise additional funds within the next year to cover its losses.</p><p><strong><a href="https://www.tomsguide.com/ai/apple-takes-on-meta-with-new-open-source-ai-model-heres-why-it-matters">Apple takes on Meta with new open-source AI model &#8212; here's why it matters</a></strong></p><p>Apple has released a new open-source AI model with 7 billion parameters, signaling its commitment to the wider AI ecosystem. The model, part of Apple's DCML project, outperforms similar-sized models from Meta and Google. It is fully open source, with all weights, training data, and processes publicly available. Despite its small size and context window, the model's open-source nature makes it a significant AI release of the year. Researchers and companies can use the model to create their own small AIs without per-token costs. This aligns with the goal of creating intelligence that is affordable and accessible.</p><p><strong><a href="https://www.nytimes.com/2024/07/25/technology/china-open-source-ai.html">China Is Closing the A.I. Gap With the United States</a></strong></p><p>At the World Artificial Intelligence Conference in Shanghai, Chinese start-up founder Qu Dongqi showcased a video created using AI technology from Chinese internet company Kuaishou. The video, which brought an old photograph to life, demonstrated the advancements China has made in the field of artificial intelligence. This technology is similar to the video generator Sora, developed by OpenAI, but the Chinese version is already available to the general public. This highlights China's progress in closing the AI gap with the United States.</p><p><strong><a href="https://apnews.com/article/sagaftra-video-game-performers-ai-strike-4f4c7d846040c24553dbc2604e5b6034">Video game performers will go on strike over artificial intelligence concerns</a></strong></p><p>Video game performers, including voice actors and motion capture performers, are going on strike due to concerns over AI protections. The Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA) has been negotiating with major game studios for nearly two years over a new interactive media agreement. While progress has been made on wages and job safety, the two sides remain divided on the regulation of generative AI. The union is concerned that without proper safeguards, game companies could use AI to replicate actors' voices or create digital replicas of their likeness without consent or fair compensation. The strike is a last resort after exhausting other possibilities. The global video game industry generates over $100 billion in profit annually, and the performers are demanding that their work be protected. The strike will not include games covered by separate contracts that have AI protections.</p><p><strong><a href="https://www.404media.co/runway-ai-image-generator-training-data-youtube/">AI Video Generator Runway Trained on Thousands of YouTube Videos Without Permission</a></strong></p><p>A recent investigation by 404 Media has revealed that the AI video generation tool developed by Runway, a multi-billion dollar company, was trained using scraped videos from YouTube creators, brands, and pirated films. The tool, initially known as Jupiter and later released as Gen-3, received widespread praise upon its launch in June. Runway, which raised $141 million in funding last year, has not provided specific details about the training data for Gen-3. This revelation raises concerns about the ethical use of AI technology and the potential infringement of copyright laws.&nbsp;</p><p><strong><a href="https://www.theverge.com/2024/7/25/24205943/anthropic-ai-web-crawler-claudebot-ifixit-scraping-training-data">Anthropic&#8217;s crawler is ignoring websites&#8217; anti-AI scraping policies</a></strong></p><p>Anthropic's web crawler, ClaudeBot, has been scraping websites for training data for AI models without regard for the websites' anti-AI scraping policies. One of the affected websites, iFixit, noticed that ClaudeBot had accessed their content almost a million times in 24 hours, violating their Terms of Use. iFixit's CEO, Kyle Wiens, expressed concern about the unauthorized use of their content and the strain it put on their resources. Although iFixit added a crawl-delay extension to their robots.txt file to block the crawler, Anthropic claimed that their crawler can only be blocked through robots.txt. Other websites, such as Read the Docs and Freelancer.com, also reported being aggressively scraped by ClaudeBot. This is not the first time ClaudeBot's scraping activities have caused issues, as previous incidents have been reported on Reddit and the Linux Mint web forum.&nbsp;</p><p><strong><a href="https://www.theguardian.com/technology/article/2024/jul/26/elon-musks-x-under-pressure-from-regulators-over-data-harvesting-for-grok-ai">Elon Musk&#8217;s X under pressure from regulators over data harvesting for Grok AI</a></strong></p><p>X (fka Twitter) is facing pressure from data regulators due to a default setting that allows users' posts to be used for training an AI chatbot called Grok. The UK and Irish data watchdogs have contacted X regarding the apparent attempt to gain user consent for data harvesting without their knowledge. Under UK GDPR, companies are not allowed to use default consent methods. The default setting on X, which comes with a pre-ticked box, allows users' posts and interactions with Grok to be used for training. Data regulators have expressed concern and emphasized the need for transparency and user notification.&nbsp;</p><p><strong><a href="https://www.theatlantic.com/newsletters/archive/2024/07/openais-search-tool-has-already-made-a-mistake/679264/">OpenAI&#8217;s Search Tool Has Already Made a Mistake</a></strong></p><p>OpenAI recently announced the launch of SearchGPT, a prototype tool that uses AI to answer questions by searching the internet. However, even in the demo, SearchGPT made a mistake. When a user searched for music festivals in Boone, North Carolina in August, the top suggestion provided by SearchGPT was a fair that actually ends in July. This highlights a common issue with AI search tools, as they often exhibit errors and inaccuracies. While AI searchbots have the potential to revolutionize internet search by providing personalized answers, they still have a long way to go in terms of accuracy and reliability.&nbsp;</p><p><strong><a href="https://www.theverge.com/2024/7/19/24198605/amd-ryzen-ai-strix-point-vs-apple-intel-qualcomm-event">AMD claims its top-tier Ryzen AI chip is faster than Apple&#8217;s M3 Pro</a></strong></p><p>AMD recently held an event to showcase its new Strix Point Ryzen AI chips, built on the Zen 5 architecture. AMD claims that these chips can outperform Apple's M3 and M3 Pro chips, as well as beat Qualcomm's and Intel's integrated graphics. The new Ryzen AI chips offer architectural improvements, with a 16% increase in instructions per clock cycle and a 19-32% boost in graphics performance per watt. However, AMD has yet to provide concrete evidence to support these claims. The company also did not provide specific details about battery life improvements. The first laptops with the new chips will be available on July 28th.&nbsp;</p><h3>Papers</h3><ul><li><p><a href="https://arxiv.org/abs/2407.16607">Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?</a></p></li><li><p><a href="https://www.nature.com/articles/s41586-024-07566-y">AI models collapse when trained on recursively generated data</a></p></li><li><p><a href="https://arxiv.org/abs/2406.04229">The CLRS-Test Algorithmic Reasoning Language Benchmark</a></p></li><li><p><a href="https://arxiv.org/abs/2407.15549">Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs</a></p></li><li><p><a href="https://arxiv.org/abs/2407.10930">Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together</a></p></li><li><p><a href="https://arxiv.org/abs/2406.04391">Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?</a></p></li><li><p><a href="https://arxiv.org/abs/2407.16741v1">OpenDevin: An Open Platform for AI Software Developers as Generalist Agents</a></p></li><li><p><a href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/">AI achieves silver-medal standard in solving IMO problems</a></p><ul><li><p>And some good <a href="https://x.com/wtgowers/status/1816509803407040909">context</a> from Timothy Gowers</p></li></ul></li></ul><h3>Closing Thoughts</h3><p>Have something to say about this edition&#8217;s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at <a href="mailto:dbashir@hmc.edu">dbashir@hmc.edu</a> or on <a href="https://twitter.com/spaniel_bashir">Twitter</a>. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!</p>]]></content:encoded></item><item><title><![CDATA[Update #79: Does AI Music Have a Future? Can Vision-Language Models See?]]></title><description><![CDATA[Two leading AI music startups are sued by major record labels; researchers find that VLMs are poor at understanding basic spatial information.]]></description><link>https://thegradientpub.substack.com/p/update-79-ai-music-future-vision-language-models</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/update-79-ai-music-future-vision-language-models</guid><dc:creator><![CDATA[daniel bashir]]></dc:creator><pubDate>Tue, 16 Jul 2024 15:30:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-9MD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 79th update from the Gradient! If you&#8217;re new and like what you see, <a href="https://thegradientpub.substack.com/">subscribe</a> and follow us on <a href="https://twitter.com/gradientpub">Twitter</a>. <strong>Our newsletters run long, so you&#8217;ll need to view this post on Substack to see everything!</strong></p><h2>Editor Notes</h2><p>As with our last Update, there&#8217;s been AI news, but there&#8217;s also been lots of other news. I hope you&#8217;ve been staying sane and that some of you, like our wonderful editor Cole, are on a beach somewhere and not thinking about AI at all (if you&#8217;re reading this, though, that&#8217;s probably not you).&nbsp;</p><p>The two podcast conversations I posted since last episode were both ones I really enjoyed: New South Wales has a very interesting and thoughtful approach to using AI in education, and David Pfau is one of the most thoughtful scientists the ML community is lucky to have.&nbsp;</p><p><strong>Also</strong>: I hate writing this probably more than you hate reading it, but it would be so, so helpful if you&#8217;d consider helping us out a bit by leaving reviews or feedback if you&#8217;re getting something from any of our content. I don&#8217;t have too many reviews on <a href="https://podcasts.apple.com/us/podcast/the-gradient-perspectives-on-ai/id1569777340">Apple Podcasts</a> or <a href="https://open.spotify.com/show/6onNcSqsP6hEEqmZ6TU2g8">Spotify</a> yet, and they really help a lot :)</p><p>As always, if you want to write with us, send a pitch using <a href="https://goo.gl/forms/whYRKEzMZJox6FaH2">this form</a>.</p><h2><strong>News Highlight</strong>: <a href="https://www.technologyreview.com/2024/06/27/1094379/ai-music-suno-udio-lawsuit-record-labels-youtube-licensing">Training AI music models is about to get very expensive</a></h2><h4><strong>Summary</strong>&nbsp;</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-9MD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-9MD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!-9MD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!-9MD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!-9MD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-9MD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png" width="358" height="358" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:358,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-9MD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!-9MD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!-9MD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!-9MD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a03cc15-08a9-4d4d-8e2e-8c54166005d0_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">generated by DALL-E, for the irony</figcaption></figure></div><p>Suno and Udio, two leading AI music startups, were sued by major record labels on June 24 &#8212; the allegation is that the companies made use of copyrighted music in their training data at massive scale. In contrast, YouTube has taken an aboveboard approach where it offers lump sums to record labels to license their catalogs for training. The outcome of this lawsuit will determine the shape of AI music generation, and, possibly, whether there is any future for it at all.&nbsp;</p><h4><strong>Overview</strong></h4><p><a href="https://suno.com/">Suno</a> and <a href="https://www.udio.com/">Udio</a> are to music what Stable Diffusion and its ilk are to images: both offer models that can, given a natural language text prompt, generate music. Unsurprisingly, the training recipe for each of the models is proprietary &#8212; and record labels allege that both startups have engaged in copyright infringement in model training and output.&nbsp;</p><p>While there are similarities to the <em>New York Times</em> vs. OpenAI&#8217;s case, Suno and Udio appear to have even less plausible deniability than their fellow offender: while <em>New York Times</em> articles and their content might appear in places besides the site itself, some say it&#8217;s clear that the AI music startups must have pulled in large databases of commercial recordings.&nbsp;</p><p>Furthermore, the case alleges that Suno&#8217;s and Udio&#8217;s tools are more imitative than generative: their output mimics the style of copyright-protected artists and songs. Suno and Udio both claim that they prioritize originality, and both have safeguards in place that prevent users from naming specific artists in a query, but (as you will know if you&#8217;ve spent time with <em>any</em> generative AI system) loopholes abound.&nbsp;</p><p>According to James Grimmelmann, professor of digital and information law at Cornell Law School, there are three possible ways the case could go: the court determines, wholly in favor of the startups, that the companies did not violate fair use or imitate copyrighted works too closely; a mixed bag, where the court finds Suno and Udio did not violate fair use in training but must better control model output to avoid imitating copyrighted works; or, in favor of the record companies, the court finds issue with both the training and output of the AI models, meaning that the companies cannot train on copyrighted works without licenses or allow outputs that closely imitate copyrighted works.&nbsp;</p><h4><strong>Our Take</strong></h4><p>I&#8217;m going to have an inevitably biased take on AI music in general, because I think a profound aspect of being a musician and producing music is the sensate part of it: producing orchestral music on a computer isn&#8217;t the same as sitting as a member of an actual orchestra. Yes, democratization is a thing, but the difficulty of putting something like an orchestra together and getting everything right is precisely what makes the outcome so valuable and important. All this is to say, I can personally live without AI-generated music, though I recognize the value people find in it.&nbsp;</p><p>I also like the point that <a href="https://www.washingtonpost.com/technology/2024/07/05/suno-ai-music-iphone-app/">this </a><em><a href="https://www.washingtonpost.com/technology/2024/07/05/suno-ai-music-iphone-app/">Washington Post</a></em><a href="https://www.washingtonpost.com/technology/2024/07/05/suno-ai-music-iphone-app/"> article points</a> to: like it or not, a technology that makes it easy for you to generate and share music is going to impact the way you interact with and value music itself. Just as telegraphy and newer technologies have changed how we write and streaming services and their ilk have changed how we consume and appreciate TV shows and movies, so too would tools like Suno and Udio shift our relationship to music if they come into widespread use. I&#8217;m not entirely sure that&#8217;s a relationship to music that I want to have.&nbsp;</p><p>But, I should comment on the case itself &#8212; the stakes are indeed very high for AI music companies because there aren&#8217;t so many options for music in the public domain. So it <em>is</em> existential for these companies. Your reaction to the verdict in this case, then, probably indicates something about your stance towards AI music and whether you think a future with it is a good future.&nbsp;</p><p>On the other hand, there is also a future with licensing analogous to the deals we&#8217;re seeing between OpenAI and publishers. But, given the music industry&#8217;s power, the prices for these licensing deals is bound to be very high &#8212;&nbsp;only YouTube and peers with as much money would be able to cough up the cash. In this world, the future and shape of these systems once again relies on the value judgments of a handful of actors.&nbsp;</p><p>&#8212;Daniel</p><h2><strong>Research Highlight: </strong><a href="https://arxiv.org/pdf/2407.06581">Vision language models are blind</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W6Uu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50eb670b-8d57-4e56-b876-8525c884b7e1_1458x638.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W6Uu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50eb670b-8d57-4e56-b876-8525c884b7e1_1458x638.png 424w, https://substackcdn.com/image/fetch/$s_!W6Uu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50eb670b-8d57-4e56-b876-8525c884b7e1_1458x638.png 848w, https://substackcdn.com/image/fetch/$s_!W6Uu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50eb670b-8d57-4e56-b876-8525c884b7e1_1458x638.png 1272w, https://substackcdn.com/image/fetch/$s_!W6Uu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50eb670b-8d57-4e56-b876-8525c884b7e1_1458x638.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W6Uu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50eb670b-8d57-4e56-b876-8525c884b7e1_1458x638.png" width="1456" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50eb670b-8d57-4e56-b876-8525c884b7e1_1458x638.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W6Uu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50eb670b-8d57-4e56-b876-8525c884b7e1_1458x638.png 424w, https://substackcdn.com/image/fetch/$s_!W6Uu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50eb670b-8d57-4e56-b876-8525c884b7e1_1458x638.png 848w, https://substackcdn.com/image/fetch/$s_!W6Uu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50eb670b-8d57-4e56-b876-8525c884b7e1_1458x638.png 1272w, https://substackcdn.com/image/fetch/$s_!W6Uu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50eb670b-8d57-4e56-b876-8525c884b7e1_1458x638.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>Researchers from Auburn University and the University of Alberta found that state-of-the-art large language models with vision capabilities (VLMs) are surprisingly poor at understanding spatial information involving basic geometric shapes, such as whether two circles overlap. They propose BlindTest, a new benchmark of 7 simple tasks that are unlikely to have prior answers in natural language on the Internet, to test VLM ability to "see" images like humans do.</p><h4><strong>Overview</strong></h4><p>Existing VLM benchmarks (such as <a href="https://arxiv.org/pdf/2311.16502">MMMU</a> and <a href="https://aclanthology.org/2022.findings-acl.177.pdf">ChartQA</a>) cover a wide range of subjects but the input images are not always necessary for answering the questions, i.e., answers may be inferred from the textual question and answer choices alone or memorized by the models from Internet-scale training. Motivated by this gap and inspired by visual acuity tests given to humans by optometrists, the authors design 7 low-level vision tasks that involve 2D geometric primitives. They then test four VLMs that rank the highest on existing multimodal vision benchmarks &#8211; GPT-4o, Gemini-1.5 Pro, Claude-3 Sonnet, and Claude-3.5 Sonnet. For each task, they prompt VLMs with two different questions that are semantically equivalent. The tasks and results are as follows:</p><ol><li><p><strong>Counting line intersections</strong>. Across 150 images of two colored lines that intersect at exactly 0, 1, or 2 points, the best accuracy is 77.33% (Sonnet-3.5) and the worst is 48.67% (GPT-4o).</p></li></ol><blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!48na!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6780d79-6ae7-4b32-b922-e75545d018bf_506x306.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!48na!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6780d79-6ae7-4b32-b922-e75545d018bf_506x306.png 424w, https://substackcdn.com/image/fetch/$s_!48na!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6780d79-6ae7-4b32-b922-e75545d018bf_506x306.png 848w, https://substackcdn.com/image/fetch/$s_!48na!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6780d79-6ae7-4b32-b922-e75545d018bf_506x306.png 1272w, https://substackcdn.com/image/fetch/$s_!48na!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6780d79-6ae7-4b32-b922-e75545d018bf_506x306.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!48na!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6780d79-6ae7-4b32-b922-e75545d018bf_506x306.png" width="506" height="306" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f6780d79-6ae7-4b32-b922-e75545d018bf_506x306.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:306,&quot;width&quot;:506,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!48na!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6780d79-6ae7-4b32-b922-e75545d018bf_506x306.png 424w, https://substackcdn.com/image/fetch/$s_!48na!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6780d79-6ae7-4b32-b922-e75545d018bf_506x306.png 848w, https://substackcdn.com/image/fetch/$s_!48na!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6780d79-6ae7-4b32-b922-e75545d018bf_506x306.png 1272w, https://substackcdn.com/image/fetch/$s_!48na!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6780d79-6ae7-4b32-b922-e75545d018bf_506x306.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><ol start="2"><li><p><strong>Two circles overlapping or touching</strong>. Over 672 images of two equal-sized circles that are overlapping, tangent, or disjoint (with variations in orientation and size), the best accuracy is 92.78% (Gemini-1.5) and the worst accuracy is 72.69% (GPT-4o, again). Further, performance tends to degrade when two circles are close together.</p></li><li><p><strong>Circled letter in a string</strong>. A red oval is superimposed on one of the letters in a string. The authors test three strings &#8211; Acknowledgement, Subdermatoglyphic, and a random string tHyUiKaRbNqWeOpXcZvM. Gemini-1.5 (92.81% accuracy) and Sonnet-3.5 (89.22%) outperform GPT-4o and Sonnet-3 by nearly 20 points. Except for GPT-4o, all models perform slightly better on the two English words than the random string, suggesting that knowing the word might help VLMs make better educated guesses.</p></li><li><p><strong>Counting overlapping shapes</strong>. N overlapping, same-sized circles (N=5,6,7,8,9) are arranged in two rows like the Olympic logo. Performance ranges from 20.83% (Gemini-1.5) to 44.16% (Sonnet-3.5). Repeating the same arrangement of pentagons yields more disparate performances from 9.16% (Gemini-1.5) to 75.83% (Sonnet-3.5). All four models are 100% accurate in counting 5 circles while performing poorly (except Sonnet-3.5) on 5 pentagons.</p></li><li><p><strong>Counting nested squares</strong>. 2 to 5 squares are nested such that each shape is entirely inside another. Sonnet-3.5 again has the best accuracy of 87.5%. GPT-4o and Sonnet-3 struggle to count even when there are only 2 or 3 squares.</p></li></ol><blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l-YQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c25d566-1e6b-4286-a3f4-0f3d58a81803_1008x288.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l-YQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c25d566-1e6b-4286-a3f4-0f3d58a81803_1008x288.png 424w, https://substackcdn.com/image/fetch/$s_!l-YQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c25d566-1e6b-4286-a3f4-0f3d58a81803_1008x288.png 848w, https://substackcdn.com/image/fetch/$s_!l-YQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c25d566-1e6b-4286-a3f4-0f3d58a81803_1008x288.png 1272w, https://substackcdn.com/image/fetch/$s_!l-YQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c25d566-1e6b-4286-a3f4-0f3d58a81803_1008x288.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l-YQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c25d566-1e6b-4286-a3f4-0f3d58a81803_1008x288.png" width="1008" height="288" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c25d566-1e6b-4286-a3f4-0f3d58a81803_1008x288.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:288,&quot;width&quot;:1008,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l-YQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c25d566-1e6b-4286-a3f4-0f3d58a81803_1008x288.png 424w, https://substackcdn.com/image/fetch/$s_!l-YQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c25d566-1e6b-4286-a3f4-0f3d58a81803_1008x288.png 848w, https://substackcdn.com/image/fetch/$s_!l-YQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c25d566-1e6b-4286-a3f4-0f3d58a81803_1008x288.png 1272w, https://substackcdn.com/image/fetch/$s_!l-YQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c25d566-1e6b-4286-a3f4-0f3d58a81803_1008x288.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><ol start="6"><li><p><strong>Counting the rows and columns of a grid</strong>. VLMs struggle to count the exact number of rows and columns in an empty grid, where the best model (Sonnet-3.5) has 59.84% accuracy and the rest have 25-26% accuracy. However, adding a single word to each cell significantly improves performance for all models. For example, GPT-4o's accuracy more than doubles from 26% to 53%.&nbsp;&nbsp;&nbsp;</p></li><li><p><strong>Following single-colored paths</strong>. The final task asks models to count the number of unique-color paths between two given stations in a simplified subway map. "Shockingly," the authors find that no model reaches 100% accuracy even when there is only one path between two stations. Most VLMs perform worse as map complexity increases.</p></li></ol><blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z14V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722f9459-4d05-48e3-8aef-ec02d62952be_1600x750.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z14V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722f9459-4d05-48e3-8aef-ec02d62952be_1600x750.png 424w, https://substackcdn.com/image/fetch/$s_!Z14V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722f9459-4d05-48e3-8aef-ec02d62952be_1600x750.png 848w, https://substackcdn.com/image/fetch/$s_!Z14V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722f9459-4d05-48e3-8aef-ec02d62952be_1600x750.png 1272w, https://substackcdn.com/image/fetch/$s_!Z14V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722f9459-4d05-48e3-8aef-ec02d62952be_1600x750.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z14V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722f9459-4d05-48e3-8aef-ec02d62952be_1600x750.png" width="1456" height="682" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/722f9459-4d05-48e3-8aef-ec02d62952be_1600x750.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:682,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z14V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722f9459-4d05-48e3-8aef-ec02d62952be_1600x750.png 424w, https://substackcdn.com/image/fetch/$s_!Z14V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722f9459-4d05-48e3-8aef-ec02d62952be_1600x750.png 848w, https://substackcdn.com/image/fetch/$s_!Z14V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722f9459-4d05-48e3-8aef-ec02d62952be_1600x750.png 1272w, https://substackcdn.com/image/fetch/$s_!Z14V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722f9459-4d05-48e3-8aef-ec02d62952be_1600x750.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><p>Overall, the BlindTest benchmark provides the first low-level, visual sanity check for VLMs. Their underwhelming performance on these simple (for humans) tasks that require zero prior knowledge are counterintuitive given their impressive performance on existing vision benchmarks, which have a <a href="https://arxiv.org/pdf/2403.20330">data leakage</a> problem. Addressing these limitations of VLMs is likely a non-trivial challenge and may help solve other known visual <a href="https://arxiv.org/pdf/2401.06209">shortcomings</a> of multimodal models such as understanding the orientation of an object.</p><h2>New from the Gradient</h2><h3><a href="https://thegradientpub.substack.com/p/david-pfau-manifold-factorization">David Pfau: Manifold Factorization and AI for Science</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dVBP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f00a30c-ea11-4b07-8822-e48a35f53683_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dVBP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f00a30c-ea11-4b07-8822-e48a35f53683_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!dVBP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f00a30c-ea11-4b07-8822-e48a35f53683_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!dVBP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f00a30c-ea11-4b07-8822-e48a35f53683_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!dVBP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f00a30c-ea11-4b07-8822-e48a35f53683_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dVBP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f00a30c-ea11-4b07-8822-e48a35f53683_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f00a30c-ea11-4b07-8822-e48a35f53683_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dVBP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f00a30c-ea11-4b07-8822-e48a35f53683_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!dVBP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f00a30c-ea11-4b07-8822-e48a35f53683_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!dVBP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f00a30c-ea11-4b07-8822-e48a35f53683_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!dVBP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f00a30c-ea11-4b07-8822-e48a35f53683_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/david-pfau-manifold-factorization&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/david-pfau-manifold-factorization"><span>Listen</span></a></p><h3><a href="https://thegradientpub.substack.com/p/dan-hart-michelle-michael-ai-education-nsw">Dan Hart and Michelle Michael: Bringing AI to Students in New South Wales</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xLoI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad80fdbd-28c0-4e0b-b9fa-04d537c55311_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xLoI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad80fdbd-28c0-4e0b-b9fa-04d537c55311_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!xLoI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad80fdbd-28c0-4e0b-b9fa-04d537c55311_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!xLoI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad80fdbd-28c0-4e0b-b9fa-04d537c55311_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!xLoI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad80fdbd-28c0-4e0b-b9fa-04d537c55311_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xLoI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad80fdbd-28c0-4e0b-b9fa-04d537c55311_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad80fdbd-28c0-4e0b-b9fa-04d537c55311_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xLoI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad80fdbd-28c0-4e0b-b9fa-04d537c55311_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!xLoI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad80fdbd-28c0-4e0b-b9fa-04d537c55311_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!xLoI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad80fdbd-28c0-4e0b-b9fa-04d537c55311_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!xLoI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad80fdbd-28c0-4e0b-b9fa-04d537c55311_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/dan-hart-michelle-michael-ai-education-nsw&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/dan-hart-michelle-michael-ai-education-nsw"><span>Listen</span></a></p><h2>Other Things That Caught Our Eyes</h2><h3>News</h3><p><strong><a href="https://www.washingtonpost.com/technology/2024/07/13/openai-safety-risks-whistleblower-sec/">OpenAI illegally barred staff from airing safety risks, whistleblowers say</a></strong></p><p>Whistleblowers at OpenAI have filed a complaint with the Securities and Exchange Commission (SEC), alleging that the company illegally prevented employees from raising safety concerns about its AI technology. The whistleblowers claim that OpenAI issued overly restrictive employment, severance, and nondisclosure agreements that could have penalized workers who reported concerns to federal regulators. These agreements violated federal laws protecting whistleblowers and their right to anonymously disclose damning information about their company. The complaint comes amid concerns that OpenAI prioritizes profit over safety in developing its technology. The SEC has not yet confirmed whether an investigation has been launched.</p><p><strong><a href="https://techcrunch.com/2024/07/08/data-workers-detail-exploitation-by-tech-industry-in-dair-report/">Data workers detail exploitation by tech industry in DAIR report</a></strong></p><p>A new report by the Data Workers' Inquiry, a collaboration between AI ethics research group DAIR and TU Berlin, sheds light on the exploitation of data workers in the tech industry. The report highlights the hidden labor of data work, such as moderation and annotation, which is often outsourced to poorer countries where workers are paid significantly less than their American or European counterparts. The conditions of this work, although not physically dangerous, can be psychologically damaging. The reports, which are largely anecdotal, provide firsthand accounts of the challenges faced by data workers, including mental health issues and a lack of support from employers. The report emphasizes the need for companies to address the exploitation of data workers and calls for further research on the topic.</p><p><strong><a href="https://theintercept.com/2024/07/08/new-york-times-openai-headlines-chatgpt/">New York Times Experiments With a New Headline Writer: OpenAI</a></strong></p><p>The New York Times has been experimenting with OpenAI's generative AI technology to develop a tool that can generate headlines for articles and apply the newspaper's style guide. The leaked code revealed that the Times used OpenAI to create a headline writer that could potentially replace editors. While the project was only an early experiment and not used by the newsroom, it highlights the increasing use of AI in newsrooms. However, there are concerns that AI could lead to further job losses for journalists. The Times is currently involved in a lawsuit against OpenAI and Microsoft for alleged copyright infringement.</p><p><strong><a href="https://www.wired.com/story/girlsdoporn-deepfake-victim-videos/">Deepfake Creators Are Revictimizing GirlsDoPorn Sex Trafficking Survivors</a></strong></p><p>Deepfake creators have reached a new low by using videos of sex trafficking victims as the basis for nonconsensual deepfake pornography. An account on a deepfake sexual abuse website posted 12 celebrity videos that were created using footage from GirlsDoPorn, a sex trafficking operation. The videos, which were up to 21 minutes long, had celebrity faces added using AI. Deepfake technology has become increasingly realistic and accessible, leading to the proliferation of websites and apps designed for deepfake sexual abuse, while laws to protect victims and limit the use of these tools are lagging. Legal proceedings against the creators of GirlsDoPorn and affiliated individuals are ongoing, with survivors being awarded damages and copyright ownership of the videos.</p><p><strong><a href="https://www.nytimes.com/2024/06/26/technology/ai-consultants.html">The A.I. Boom Has an Unlikely Early Winner: Wonky Consultants</a></strong></p><p>The rise of AI has created a demand for consultants who can help businesses understand and implement this technology. Companies like Boston Consulting Group and McKinsey &amp; Company are experiencing a surge in revenue and hiring as they assist businesses in navigating the implications of AI and how it can benefit their operations. For example, Reckitt Benckiser, the maker of Lysol and Mucinex, sought the expertise of Boston Consulting Group to explore how AI could be applied to their business.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/07/09/humane-execs-leave-company-to-found-ai-fact-checking-startup/">Humane execs leave company to found AI fact-checking startup</a></strong></p><p>Former employees of Humane, a struggling AI hardware company, have left to start their own startup called Infactory. Infactory is a fact-checking search engine that aims to provide accurate information by pulling data directly from trusted sources, rather than relying on generative AI services. The founders, Brooke Hartley Moy and Ken Kocienda, emphasize the importance of using AI selectively and focusing on computational and factual data. Infactory plans to target enterprise customers, such as newsrooms and research facilities, and will initially focus on data-related subjects. The startup has raised pre-seed funding and will be seeking seed funding in the next six to 18 months. Despite their departure from Humane, the founders deny that their decision was a direct result of the company's struggles. Infactory is set to launch in a few months.</p><p><strong><a href="https://www.theverge.com/2024/7/11/24196769/copied-act-cantwell-blackburn-heinrich-ai-journalists-artists">The AI-focused COPIED Act would make removing digital watermarks illegal</a></strong></p><p>A new bill called the Content Origin Protection and Integrity from Edited and Deepfaked Media Act (COPIED Act) has been introduced by a bipartisan group of senators &#8212; it aims to authenticate and detect artificial intelligence-generated content, protecting journalists and artists from having their work used without permission. The Act directs the National Institute of Standards and Technology (NIST) to create standards and guidelines for proving the origin of content and detecting synthetic content through watermarking. It also requires AI tools to allow users to attach information about the origin of creative or journalistic content and prohibits the removal of this information. Content owners can sue companies that use their materials without permission or tamper with authentication markers. The bill has received support from publishing and artists' groups.&nbsp;</p><p><strong><a href="https://www.theverge.com/2024/7/11/24196396/the-atlantic-openai-licensing-deal-ai-news-journalism-web-future-decoder-podcasts">Why The Atlantic signed a deal with OpenAI</a></strong></p><p>The Atlantic has signed a deal with OpenAI, allowing the AI company to use The Atlantic's archives as training data. The CEO of The Atlantic, Nicholas Thompson, explains that the deal provides revenue and a potential traffic source for the magazine. The deal includes three main components: allowing OpenAI to train on The Atlantic's data for two years, a product partnership where OpenAI provides credits and potential engineering support, and the inclusion of The Atlantic in OpenAI's search product. The goal of the deal is to shape the future of AI and ensure that journalists and media companies are paid for their work.</p><p><strong><a href="https://www.theatlantic.com/technology/archive/2024/07/thrive-ai-health-huffington-altman-faith/678984/">AI Has Become a Technology of Faith</a></strong></p><p>The article discusses the launch of a new company called Thrive AI Health, which aims to bring OpenAI's technology into the healthcare industry. The company plans to develop a hyper-personalized AI health coach that will generate personalized insights based on a user's biometric and health data. The article raises concerns about privacy and the potential misuse of personal health information. The founders of Thrive AI Health argue that people are willing to share personal details with AI language models and that the technology can offer behavioral solutions to improve health outcomes. However, the article questions the feasibility and potential risks of such a product.&nbsp;</p><p><strong><a href="https://www.tomshardware.com/tech-industry/artificial-intelligence/chinas-homegrown-linux-distro-adds-ai-integration-openkylin-gets-ai-assistant-text-to-image-generation-and-local-llm-support">China's homegrown OS fires back at AI PCs &#8212; openKylin gets AI assistant, text-to-image generation, and local LLM support</a></strong></p><p>China's openKylin operating system, an open-source OS based on Linux, has released a new version that is deeply integrated with AI. The OS features support for on-device LLMs, an AI assistant, and text-to-image generation. The aim is to boost productivity and user experience for those using domestic operating systems. AI PCs, equipped with advanced processors capable of running generative AI tasks locally, have gained popularity in China. Lenovo sees China as a unique market for AI PCs due to data-localization requirements. OpenKylin is part of China's effort to decrease dependence on foreign operating systems. However, Windows remains the dominant OS in China with nearly 80% of the market.</p><p><strong><a href="https://www.theverge.com/2024/7/10/24195528/microsoft-apple-openai-board-observer-seat-drop-regulator-scrutiny">Microsoft and Apple ditch OpenAI board seats amid regulatory scrutiny</a></strong></p><p>Microsoft and Apple have both stepped down from their board seats at OpenAI, a nonprofit organization focused on artificial intelligence (AI) research. Microsoft had secured a non-voting seat less than eight months ago, but concerns over antitrust issues have led to the decision. OpenAI will now adopt a new approach to engage with strategic partners and investors, including Microsoft and Apple, through regular stakeholder meetings. The move comes as UK and EU regulators are investigating Microsoft's partnership with OpenAI, along with other AI deals involving major tech companies. Microsoft has invested over $10 billion in OpenAI, making it the exclusive cloud partner and benefiting from OpenAI's AI models to enhance its own products and services.&nbsp;</p><p><strong><a href="https://www.nytimes.com/2024/07/11/climate/kobold-zambia-copper-ai-mining.html">AI Needs Copper. It Just Helped to Find Millions of Tons of It.</a></strong></p><p>KoBold Metals, a company in California, has used AI-driven technology to discover a large copper deposit in Zambia. The company estimates that the mine could produce at least 300,000 tons of copper per year, worth billions of dollars annually. An independent assessment largely corroborated the size of the deposit. KoBold expects the value of the mine to increase as they continue to map the full extent of the highest-grade ore.&nbsp;</p><p><strong><a href="https://www.japantimes.co.jp/news/2024/07/02/japan/sdf-cybersecurity/">Japan&#8217;s Defense Ministry unveils first basic policy on use of AI</a></strong></p><p>The Defense Ministry of Japan has released its first basic policy on the use of AI in order to address concerns about recruitment and the need to utilize personnel more efficiently. With a declining and aging population, Japan aims to leverage AI to overcome these challenges and keep up with China and the United States in terms of military applications of the technology. The policy highlights the potential of AI to alleviate manpower shortages and enhance the capabilities of the Self-Defense Forces. Japan&#8217;s move reflects its recognition of the importance of AI in maintaining its technological edge and addressing demographic issues.</p><p><strong><a href="https://www.scmp.com/tech/big-tech/article/3269387/chinas-ai-competition-deepens-sensetime-alibaba-claim-progress-ai-show">China&#8217;s AI competition deepens as SenseTime, Alibaba claim progress at AI show</a></strong></p><p>Chinese AI companies SenseTime and Alibaba showcased their progress in developing LLMs at the World Artificial Intelligence Conference (WAIC) in Shanghai. SenseTime released its latest foundational model, SenseNova 5.5, which boasts a 30% improvement in performance compared to the previous version. SenseTime claimed that SenseNova 5.5 outperforms GPT-4o in five out of eight key metrics. Alibaba's cloud computing unit highlighted the growth of its Tongyi Qianwen LLMs, with downloads doubling to over 20 million in the past two months. The number of customers served by Alibaba Cloud Model Studio also increased by over 150% to 230,000. The competition in the Chinese LLM market is intensifying, with only a few companies predicted to dominate in the future.&nbsp;</p><h3>Papers</h3><p>Our recs:</p><ul><li><p><a href="https://arxiv.org/abs/2407.03310">Universal Length Generalization with Turing Programs</a></p></li><li><p><a href="https://arxiv.org/abs/2402.14905">MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases</a></p></li><li><p><a href="https://arxiv.org/abs/2407.04620">Learning to (Learn at Test Time): RNNs with Expressive Hidden States</a></p></li><li><p><a href="https://situational-awareness-dataset.org/">Me, Myself, and AI: The Situational Awareness Dataset for LLMs</a></p></li><li><p><a href="https://arxiv.org/abs/2407.06023">Distilling System 2 into System 1</a></p></li><li><p>Not a paper, but a report: <a href="https://www.alignmentforum.org/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1">Othello-GPT learned a bag of heuristics</a></p></li><li><p><a href="https://www.saysmaybe.com/latest-work/selective-perspectives-nyt">Selective Perspectives: A Content Analysis of The New York Times&#8217; Reporting on AI</a></p></li><li><p><a href="https://arxiv.org/abs/2407.05502">Faux Polyglot: A Study on Information Disparity in Multilingual LLMs</a></p></li><li><p><a href="https://tridao.me/publications/flash3/flash3.pdf">FlashAttention-3</a></p></li></ul><h3>Closing Thoughts</h3><p>Have something to say about this edition&#8217;s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at <a href="mailto:dbashir@hmc.edu">dbashir@hmc.edu</a> or on <a href="https://twitter.com/spaniel_bashir">Twitter</a>. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!</p>]]></content:encoded></item><item><title><![CDATA[Mini-Update #44: OpenAI Security Breach and Accessible Diffusion Models]]></title><description><![CDATA[OpenAI insiders reveal security breach a year later, and researchers at Apple partner with other organizations to produce a tutorial on developing diffusion models.]]></description><link>https://thegradientpub.substack.com/p/mini-update-44-openai-security-breach</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/mini-update-44-openai-security-breach</guid><dc:creator><![CDATA[Ather Fawaz]]></dc:creator><pubDate>Thu, 11 Jul 2024 01:01:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!SaTq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5984cc53-ea6f-4e01-a682-2145c895f2c2_1280x853.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 44th mini-update from the Gradient! This is our exclusive newsletter edition specifically for paying subscribers and is our way to show you our appreciation for your support.</p><h1><strong>News Highl&#8230;</strong></h1>
      <p>
          <a href="https://thegradientpub.substack.com/p/mini-update-44-openai-security-breach">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Update #78: Accelerating Candy Crush Development and Neural Network Flexibility]]></title><description><![CDATA[Activision Blizzard scientists discuss AI's role in Candy Crush; researchers study neural networks' flexibility in fitting data.]]></description><link>https://thegradientpub.substack.com/p/update-78-accelerating-candy-crush</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/update-78-accelerating-candy-crush</guid><dc:creator><![CDATA[daniel bashir]]></dc:creator><pubDate>Tue, 02 Jul 2024 15:30:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0_B0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adf904a-020d-41d6-851d-c44b690be771_1080x602.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 78th update from the Gradient! If you&#8217;re new and like what you see, <a href="https://thegradientpub.substack.com/">subscribe</a> and follow us on <a href="https://twitter.com/gradientpub">Twitter</a>. <strong>Our newsletters run long, so you&#8217;ll need to view this post on Substack to see everything!</strong></p><h2>Editor Notes</h2><p>Good morning. Not a terrible amount to call out this week. If you&#8217;re in the states, I think other events have seized everyone&#8217;s attention recently. </p><p>As always, if you want to write with us, send a pitch using <a href="https://goo.gl/forms/whYRKEzMZJox6FaH2">this form</a>.</p><h2><strong>News Highlight</strong>: How AI is accelerating Candy Crush development</h2><h4><strong>Summary</strong>&nbsp;</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0_B0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adf904a-020d-41d6-851d-c44b690be771_1080x602.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0_B0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adf904a-020d-41d6-851d-c44b690be771_1080x602.png 424w, https://substackcdn.com/image/fetch/$s_!0_B0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adf904a-020d-41d6-851d-c44b690be771_1080x602.png 848w, https://substackcdn.com/image/fetch/$s_!0_B0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adf904a-020d-41d6-851d-c44b690be771_1080x602.png 1272w, https://substackcdn.com/image/fetch/$s_!0_B0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adf904a-020d-41d6-851d-c44b690be771_1080x602.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0_B0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adf904a-020d-41d6-851d-c44b690be771_1080x602.png" width="624" height="347.8222222222222" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8adf904a-020d-41d6-851d-c44b690be771_1080x602.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:602,&quot;width&quot;:1080,&quot;resizeWidth&quot;:624,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0_B0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adf904a-020d-41d6-851d-c44b690be771_1080x602.png 424w, https://substackcdn.com/image/fetch/$s_!0_B0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adf904a-020d-41d6-851d-c44b690be771_1080x602.png 848w, https://substackcdn.com/image/fetch/$s_!0_B0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adf904a-020d-41d6-851d-c44b690be771_1080x602.png 1272w, https://substackcdn.com/image/fetch/$s_!0_B0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adf904a-020d-41d6-851d-c44b690be771_1080x602.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In a recent interview with <a href="https://www.gamesindustry.biz/how-king-is-using-ai-to-speed-up-development-of-new-candy-crush-levels">gamesindustry.biz</a>, the leads of Activision Blizzard King&#8217;s (AI) AI Labs discussed the role that AI has played throughout the development of &#8220;Candy Crush&#8221;, one of their largest and most successful games. The scientists discuss their goals in building a playtesting bot as well as its usage by developers to increase both the quality and speed of level development. Additionally, the researchers at King emphasize the importance of humans working in synch with these AI tools rather than being replaced by them. This is a welcome contrast to the dominant narrative of AI in gaming which typically consists of either the <a href="https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem">theft of artists&#8217; intellectual property</a> by generative models or further <a href="https://www.cnet.com/tech/gaming/generative-ai-is-coming-for-video-games-heres-how-it-could-change-gaming/">weakening the job market for games and software developers</a>.</p><h4><strong>Overview</strong></h4><p>Since April 2012, developers at ABK have released an average of over 4 new Candy Crush levels every day for the past 12 years for a total of almost 17 thousand unique levels. This volume of content is practically unprecedented in the industry and much of its recent throughput could be attributed to ABK&#8217;s acquisition of the AI firm <a href="https://www.gamesindustry.biz/king-purchase-ai-firm-peltarion">Peltarion</a> in 2022. After the acquisition, developers from Peltarion began to flesh out leadership positions in King&#8217;s AI labs and develop plans for playtesting bots. By training neural networks using nothing but their players' game data, the scientists at King&#8217;s AI labs were able to build playtesting bots at a variety of human skill levels. These bots are used by the developers at King to automatically detect bugs before release and helped them reduce the number of manual tweaks by over 95%. Additionally, these bots can be used by game designers to understand the appropriateness of the intended difficulty of the levels as well as the impact of shuffle (randomization), a core gameplay mechanic of candy crush. By investing in these AI development tools, game designers and quality assurance (QA) workers at ABK have been able to supercharge their outputs, developing more levels faster, as well as better understanding how their players will interact with a level (and its difficulty beforehand) .</p><p>While some may see these AI tools as a path towards generating more things faster and reducing pesky labor costs, the developers at ABK consistently emphasized the impact and importance of the human designers working in tandem with AI tools. While these AI solutions may automate away some of tasks that are typical of designers and QA workers, it can free their responsibilities from some of the more menial, repetitive, and tedious tasks so that they can focus on more meaningful and high impact tasks. Ultimately, the designers (and other humans working in game development) are the ones who know what &#8220;fun&#8221; really is and there is no expectation that generative AI will ever be able to replicate or develop that kind of intuitive sense.</p><h4><strong>Our Take</strong></h4><p>When I first started working gaming, I got an inside look at how playtesting bots can supercharge development and improve the quality of work for all kinds of game developers (designers, QA, data scientists, product leads etc). Seeing that body of work being pursued by other gaming companies has me super enthused. However, as excited as I am by the end results, I wish the scientists ABK shared more details on the inner workings of their models. While I am sure there is a businessperson making a case to not reveal any trade secrets, it would have been nice to get really any details on how they trained and tested their models. Are they using reinforcement learning? What about transformers? Hopefully there will be a GDC talk or paper coming soon where we could learn more! - Justin&nbsp;</p><h2><strong>Research Highlight: </strong><a href="https://arxiv.org/pdf/2406.11463">Just How Flexible are Neural Networks in Practice?</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UfqK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83f1fa9-2124-46df-9fa3-f8b1b35ea5d2_974x614.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UfqK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83f1fa9-2124-46df-9fa3-f8b1b35ea5d2_974x614.png 424w, https://substackcdn.com/image/fetch/$s_!UfqK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83f1fa9-2124-46df-9fa3-f8b1b35ea5d2_974x614.png 848w, https://substackcdn.com/image/fetch/$s_!UfqK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83f1fa9-2124-46df-9fa3-f8b1b35ea5d2_974x614.png 1272w, https://substackcdn.com/image/fetch/$s_!UfqK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83f1fa9-2124-46df-9fa3-f8b1b35ea5d2_974x614.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UfqK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83f1fa9-2124-46df-9fa3-f8b1b35ea5d2_974x614.png" width="542" height="341.6714579055442" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a83f1fa9-2124-46df-9fa3-f8b1b35ea5d2_974x614.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:614,&quot;width&quot;:974,&quot;resizeWidth&quot;:542,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UfqK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83f1fa9-2124-46df-9fa3-f8b1b35ea5d2_974x614.png 424w, https://substackcdn.com/image/fetch/$s_!UfqK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83f1fa9-2124-46df-9fa3-f8b1b35ea5d2_974x614.png 848w, https://substackcdn.com/image/fetch/$s_!UfqK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83f1fa9-2124-46df-9fa3-f8b1b35ea5d2_974x614.png 1272w, https://substackcdn.com/image/fetch/$s_!UfqK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83f1fa9-2124-46df-9fa3-f8b1b35ea5d2_974x614.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure - Effect of scaling strategy on the flexibility of a model measured in terms of Effective Model Complexity (EMC). Key findings are a) ResNet-RS is the most efficient among scaling strategies; b) scaling depth is more parameter-efficient than scaling width</figcaption></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>Neural networks are commonly believed to fit training sets with as many samples as they have parameters. However, the effectiveness of this fitting depends significantly on the training procedures such as the optimizer or regularizer and the specific architecture used. In this work, the authors study how flexible neural networks truly are when it comes to fitting data. The key findings are as follows:</p><ol><li><p>Conventional optimizers typically locate minima where models fit far fewer samples than the models parameter count;</p></li><li><p>Convolutional neural networks (CNNs) surpass Multi-Layer Perceptrons (MLPs) and Vision Transformers (ViTs) in parameter efficiency with randomly labeled data;</p></li><li><p>Stochastic gradient descent finds solutions that fit more training data than full-batch gradient descent;</p></li><li><p>ReLU activation functions enhance data fitting capabilities by mitigating vanishing and exploding gradients.</p></li></ol><h4><strong>Overview</strong>&nbsp;</h4><p>It is widely believed that a neural network can fit a training set with at least as many samples as it has parameters, which forms the basis for understanding <em>overparameterized</em> and <em>underparameterized</em> models. Despite the theoretical guarantees on neural networks being universal function approximators, the models we train in practice have limited capacities and often lead to suboptimal local minima during training, constrained by the methods we use. Their flexibility to fit data varies based on a number of factors such as the characteristics of the data, the architecture and size of the model or the optimizer used. In the recent paper &#8216;Just How Flexible are Neural Networks in Practice?&#8217;, the authors empirically study this relationship.&nbsp;</p><p>To quantify this flexibility of a model, the authors use Effective Model Complexity (EMC) metric, which estimates the maximum number of samples a model can perfectly fit. The EMC is determined through an iterative training process, where the model is trained on progressively larger datasets until it fails to achieve 100% accuracy. The study uses different initializations and sample subsets in each iteration to ensure fairness in evaluation. An important factor when studying fitting of a model to data is also to ensure the model is not under trained on a given data. The authors verify this by checking for the absence of negative singular values in the Hessian loss.&nbsp;</p><p>To comprehensively dissect the factors influencing neural network flexibility, a variety of models with varying depths and widths such as&nbsp; Multi-Layer Perceptrons (MLPs), Convolutional neural networks (CNNs) , EfficientNet, and Vision Transformers (ViTs) on a range of datasets including ImageNet. Multiple optimizers are also used to account for the effect of stochasticity and preconditioning on the minima.&nbsp; Below we analyze some key empirically established relationships and trends across different training or dataset choices and model fitting.</p><p><strong>Effect of Dataset</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B9qH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39e12627-0be3-49d3-a080-1e1f35784607_1224x726.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B9qH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39e12627-0be3-49d3-a080-1e1f35784607_1224x726.png 424w, https://substackcdn.com/image/fetch/$s_!B9qH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39e12627-0be3-49d3-a080-1e1f35784607_1224x726.png 848w, https://substackcdn.com/image/fetch/$s_!B9qH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39e12627-0be3-49d3-a080-1e1f35784607_1224x726.png 1272w, https://substackcdn.com/image/fetch/$s_!B9qH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39e12627-0be3-49d3-a080-1e1f35784607_1224x726.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B9qH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39e12627-0be3-49d3-a080-1e1f35784607_1224x726.png" width="480" height="284.70588235294116" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/39e12627-0be3-49d3-a080-1e1f35784607_1224x726.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:726,&quot;width&quot;:1224,&quot;resizeWidth&quot;:480,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B9qH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39e12627-0be3-49d3-a080-1e1f35784607_1224x726.png 424w, https://substackcdn.com/image/fetch/$s_!B9qH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39e12627-0be3-49d3-a080-1e1f35784607_1224x726.png 848w, https://substackcdn.com/image/fetch/$s_!B9qH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39e12627-0be3-49d3-a080-1e1f35784607_1224x726.png 1272w, https://substackcdn.com/image/fetch/$s_!B9qH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39e12627-0be3-49d3-a080-1e1f35784607_1224x726.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As shown in the figure above, the findings indicate considerable differences in the Effective Model Complexity (EMC) of networks depending on the type of data used for training. For example, networks trained with tabular data (Income, Forest, Covertype) demonstrate greater capacity. Further for image classification datasets, there is a notable correlation between test accuracy and EMC.&nbsp;</p><p><strong>Optimizers and Model architectures</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!as43!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c6b9f0-6422-49da-81e4-f5e0782b57bc_850x518.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!as43!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c6b9f0-6422-49da-81e4-f5e0782b57bc_850x518.png 424w, https://substackcdn.com/image/fetch/$s_!as43!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c6b9f0-6422-49da-81e4-f5e0782b57bc_850x518.png 848w, https://substackcdn.com/image/fetch/$s_!as43!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c6b9f0-6422-49da-81e4-f5e0782b57bc_850x518.png 1272w, https://substackcdn.com/image/fetch/$s_!as43!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c6b9f0-6422-49da-81e4-f5e0782b57bc_850x518.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!as43!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c6b9f0-6422-49da-81e4-f5e0782b57bc_850x518.png" width="524" height="319.33176470588234" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7c6b9f0-6422-49da-81e4-f5e0782b57bc_850x518.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:518,&quot;width&quot;:850,&quot;resizeWidth&quot;:524,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!as43!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c6b9f0-6422-49da-81e4-f5e0782b57bc_850x518.png 424w, https://substackcdn.com/image/fetch/$s_!as43!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c6b9f0-6422-49da-81e4-f5e0782b57bc_850x518.png 848w, https://substackcdn.com/image/fetch/$s_!as43!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c6b9f0-6422-49da-81e4-f5e0782b57bc_850x518.png 1272w, https://substackcdn.com/image/fetch/$s_!as43!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c6b9f0-6422-49da-81e4-f5e0782b57bc_850x518.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As seen from the figure above (left), CNNs demonstrate greater parameter efficiency compared to MLPs and ViTs, even when trained on images with random labels. This efficiency cannot be explained by their inductive bias, which suggests that CNNs generalize effectively due to their preference for labeling functions that exhibit specific symmetries. It is commonly believed that stochastic training acts as a regularizer. However, as shown in the Figure above (right), SGD tends to locate minima that fit more training data compared to full-batch gradient descent.&nbsp;</p><p><strong>Generalization</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T6fg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a67599a-e43f-4798-9a07-1e385ffcb2c5_822x488.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T6fg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a67599a-e43f-4798-9a07-1e385ffcb2c5_822x488.png 424w, https://substackcdn.com/image/fetch/$s_!T6fg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a67599a-e43f-4798-9a07-1e385ffcb2c5_822x488.png 848w, https://substackcdn.com/image/fetch/$s_!T6fg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a67599a-e43f-4798-9a07-1e385ffcb2c5_822x488.png 1272w, https://substackcdn.com/image/fetch/$s_!T6fg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a67599a-e43f-4798-9a07-1e385ffcb2c5_822x488.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T6fg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a67599a-e43f-4798-9a07-1e385ffcb2c5_822x488.png" width="548" height="325.3333333333333" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a67599a-e43f-4798-9a07-1e385ffcb2c5_822x488.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:488,&quot;width&quot;:822,&quot;resizeWidth&quot;:548,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T6fg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a67599a-e43f-4798-9a07-1e385ffcb2c5_822x488.png 424w, https://substackcdn.com/image/fetch/$s_!T6fg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a67599a-e43f-4798-9a07-1e385ffcb2c5_822x488.png 848w, https://substackcdn.com/image/fetch/$s_!T6fg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a67599a-e43f-4798-9a07-1e385ffcb2c5_822x488.png 1272w, https://substackcdn.com/image/fetch/$s_!T6fg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a67599a-e43f-4798-9a07-1e385ffcb2c5_822x488.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure: % percent increase in EMC when models encounter semantic labels versus random ones versus the generalization gap</figcaption></figure></div><p>One can observe from the figure above that the disparity in a model's ability to fit correctly versus incorrectly labeled samples serves as a predictor of its generalization capabilities. Specifically, models that exhibit strong generalization can accommodate significantly more correctly labeled samples than randomly labeled ones, indicating their ability to generalize effectively within their training set.</p><p><strong>Our Take</strong></p><p>We often assess whether a neural network is overparameterized or underparameterized by counting its parameters. In practice, the amount of data a network can fit provides a more accurate measure of its capacity. I find this paper interesting since it provides empirical evidence on the interplay between model architecture, training procedures, and data characteristics which is complementary to the theoretical guarantees in this space. Further, they indicate that neural networks exhibit parameter inefficiency, and might help to develop new parameterizations that could enhance their overall efficiency. However, I would like to see more apples to apples comparison in certain ablations such as comparing performance across datasets like CIFAR10, ImageNet despite them having different classes and characteristics.&nbsp;</p><p>&#8211; Sharut</p><h2>New from the Gradient</h2><h3><a href="https://thegradientpub.substack.com/p/kristin-lauter-private-ai-homomorphic-encryption">Kristin Lauter: Private AI, Homomorphic Encryption, and AI for Cryptography</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Jwj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F409d30cf-be96-4d98-8147-9579772cc50b_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Jwj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F409d30cf-be96-4d98-8147-9579772cc50b_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!5Jwj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F409d30cf-be96-4d98-8147-9579772cc50b_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!5Jwj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F409d30cf-be96-4d98-8147-9579772cc50b_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!5Jwj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F409d30cf-be96-4d98-8147-9579772cc50b_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Jwj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F409d30cf-be96-4d98-8147-9579772cc50b_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/409d30cf-be96-4d98-8147-9579772cc50b_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Jwj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F409d30cf-be96-4d98-8147-9579772cc50b_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!5Jwj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F409d30cf-be96-4d98-8147-9579772cc50b_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!5Jwj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F409d30cf-be96-4d98-8147-9579772cc50b_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!5Jwj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F409d30cf-be96-4d98-8147-9579772cc50b_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/kristin-lauter-private-ai-homomorphic-encryption&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/kristin-lauter-private-ai-homomorphic-encryption"><span>Listen</span></a></p><h3><a href="https://thegradientpub.substack.com/p/sergiy-nesterenko-quilter-pcb-design-ai-compiler">Sergiy Nesterenko: Automating Circuit Board Design</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Iamg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8315a9ba-14ea-40fc-9180-1742ddd181ca_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Iamg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8315a9ba-14ea-40fc-9180-1742ddd181ca_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!Iamg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8315a9ba-14ea-40fc-9180-1742ddd181ca_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!Iamg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8315a9ba-14ea-40fc-9180-1742ddd181ca_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Iamg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8315a9ba-14ea-40fc-9180-1742ddd181ca_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Iamg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8315a9ba-14ea-40fc-9180-1742ddd181ca_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8315a9ba-14ea-40fc-9180-1742ddd181ca_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Iamg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8315a9ba-14ea-40fc-9180-1742ddd181ca_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!Iamg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8315a9ba-14ea-40fc-9180-1742ddd181ca_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!Iamg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8315a9ba-14ea-40fc-9180-1742ddd181ca_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Iamg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8315a9ba-14ea-40fc-9180-1742ddd181ca_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/sergiy-nesterenko-quilter-pcb-design-ai-compiler&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/sergiy-nesterenko-quilter-pcb-design-ai-compiler"><span>Listen</span></a></p><h2>Other Things That Caught Our Eyes</h2><h3>News</h3><p><strong><a href="https://www.niemanlab.org/2024/06/chatgpt-is-hallucinating-fake-links-to-its-news-partners-biggest-investigations/">ChatGPT is hallucinating fake links to its news partners&#8217; biggest investigations</a></strong></p><p>ChatGPT is generating fake URLs and directing users to 404 errors instead of real article pages for several major news media companies that have partnered with OpenAI. These publications include The Associated Press, The Wall Street Journal, The Financial Times, The Times, Le Monde, El Pa&#237;s, The Atlantic, The Verge, Vox, and Politico. Despite the licensing deals stating that ChatGPT will produce attributed summaries and link to the publications' websites, the chatbot is currently unable to reliably link to these partner publications' most noteworthy stories. OpenAI has acknowledged that the citation features promised in the licensing contracts are still in development and not yet available in ChatGPT. The article highlights the concern among journalists about ChatGPT's potential as a search tool and the need for transparency and protections for the integrity of journalism.&nbsp;</p><p><strong><a href="https://www.businessinsider.com/mcdonalds-ai-voice-order-technology-drive-thrus-2024-6">McDonald's is removing its AI drive-thru voice-ordering system from over 100 restaurants after its mishaps went viral</a></strong></p><p>McDonald's is removing its AI drive-thru voice-ordering system, called the Automated Order Taker, from over 100 restaurants after videos showcasing flaws with the technology went viral. The fast-food chain collaborated with IBM in 2021 to develop and deploy the AI software. However, the technology did not meet expectations, leading to its removal. This decision marks the end of a test period conducted with IBM. The mishaps with the AI system highlight that generative AI is not yet advanced enough to fully replace human jobs in industries like restaurants.&nbsp;</p><p><strong><a href="https://www.scientificamerican.com/article/how-this-real-image-won-an-ai-photo-competition/">How This Real Image Won an AI Photo Competition</a></strong></p><p>Photographer and writer Miles Astray submitted a photograph titled "F L A M I N G O N E" to the 1839 Awards, a photography competition with a category for images created by artificial intelligence (AI). The image appeared to show a flamingo without its head and neck, but it was actually a real photograph taken by Astray. The photograph won the People's Vote Award before being disqualified. Astray wanted to show that nature can still outdo machines in terms of creativity and beauty. He also highlighted the dangers of AI-generated content being indistinguishable from real content. Astray believes that tagging AI-generated images and videos and educating people to be critical thinkers can help address the potential for AI-generated misinformation.&nbsp;</p><p><strong><a href="https://www.businessinsider.com/microsoft-ai-copilot-future-openai-2024-3">Copilot dreams and nightmares: Microsoft insiders share what they really think about the company's AI future</a></strong></p><p>Microsoft insiders have shared their thoughts on the company's AI future and its partnership with OpenAI. While there is optimism about Microsoft's ability to improve its AI offerings and leverage its existing trust with customers, there are concerns about the value of new AI services for corporate customers and the resentment caused by the OpenAI partnership. Microsoft's Copilot offerings, based on OpenAI's GPT models, aim to provide automated support for various tasks. However, there are mixed early results and a race to add value before customers question the return on investment. Microsoft is also facing pressure from OpenAI's move into the business market.&nbsp;</p><p><strong><a href="https://waymo.com/blog/2024/06/waymo-one-is-now-open-to-everyone-in-san-francisco">Waymo One is now open to everyone in San Francisco</a></strong></p><p>Waymo, the autonomous vehicle company, has announced that its Waymo One ride-hailing service is now available to everyone in San Francisco. Waymo has been operating in the city for several years, gradually scaling its service. The Waymo One service offers safe, sustainable, and reliable transportation, with about 30% of rides being to local businesses. Waymo's fleet is all-electric and sources 100% renewable energy. The service has helped reduce carbon emissions and improve personal safety for riders. Waymo also offers a unique way for tourists to experience the city. With a focus on safety, Waymo has a track record of over 20 million rider-only miles and is committed to working with city and state officials to ensure responsible growth.&nbsp;</p><p><strong><a href="https://www.technologyreview.com/2024/06/27/1094379/ai-music-suno-udio-lawsuit-record-labels-youtube-licensing/">Training AI music models is about to get very expensive</a></strong></p><p>The article discusses the potential legal implications and financial consequences for AI music companies that use copyrighted works to train their models. Record labels have filed lawsuits against these companies, alleging that their models imitate copyrighted works too closely. The outcomes of the lawsuits could range from the court finding in favor of the AI startups, determining that they did not violate fair use, to the court finding fault on both the training and output sides of the AI models, resulting in potential damages for infringement. The article also mentions the possibility of a future licensing market for AI music models, similar to what has already happened with text generators. However, licensing deals could be costly, potentially limiting access to powerful music models to those with significant financial resources. The article concludes by noting that training AI models exclusively on public domain music would be challenging and result in models that are far less impressive than what exists today.</p><p><strong><a href="https://techcrunch.com/2024/06/29/mit-robotics-pioneer-rodney-brooks-thinks-people-are-vastly-overestimating-generative-ai/">MIT robotics pioneer Rodney Brooks thinks people are vastly overestimating generative AI</a></strong></p><p>Rodney Brooks, a robotics pioneer and co-founder of companies like Rethink Robotics and iRobot, believes that people are overestimating the capabilities of generative AI. While he acknowledges that generative AI is impressive technology, he argues that it cannot perform all tasks that humans can and that humans tend to overestimate its abilities. Brooks emphasizes that generative AI is not human-like and should not be assigned human capabilities. He provides an example of a warehouse robotics system, where using generative AI to tell the robots where to go would actually slow things down compared to connecting them to a stream of data from the warehouse management software. Brooks also highlights the importance of solving solvable problems and designing robots for practical purposes rather than building human-like robots. He believes in making technology accessible and purpose-built, and he cautions against the mistaken belief that technology always grows exponentially. While he sees potential for LLMs in domestic robots for specific tasks, he emphasizes that the problem lies in control theory and math optimization, not in language capabilities.&nbsp;</p><p><strong><a href="https://www.theverge.com/2024/6/28/24188457/solos-airgo-vision-glasses-chatgpt-ray-ban-meta-competitor">Here comes a Meta Ray-Bans challenger with ChatGPT-4o and a camera</a></strong></p><p>Solos, a competitor to Ray-Ban Meta Smart Glasses, plans to release a camera-equipped version called Solos AirGo Vision later this year. These smart glasses will feature OpenAI's new GPT-4o AI model, allowing the camera to recognize objects and answer questions about what the wearer sees. The glasses will have a swappable frame system, allowing users to switch out the camera for different looks or sun shades. The Vision will also include notification LEDs for incoming calls or emails and can be integrated with Google Gemini and Anthropic's Claude AI models. While the price and release date for the AirGo Vision have not been announced, they are expected to cost more than $249.99, the price of the camera-less version. The Ray-Ban Meta Smart Glasses currently start at $299.</p><p><strong><a href="https://www.siliconrepublic.com/machines/y-combinator-california-ai-safety-bill">Y Combinator rallies startups against California&#8217;s AI safety bill</a></strong></p><p>Y Combinator and several AI startups have expressed opposition to California's Senate Bill 1047, which aims to regulate the development of AI systems. The bill would require developers of large AI models to take precautions such as safety testing, implementing safeguards, and post-deployment monitoring. Y Combinator argues that the responsibility for misuse should lie with those who abuse the tools, not the developers. They believe that holding developers liable for unintended misuse could stifle innovation and discourage investment in AI research. The letter also criticizes other aspects of the bill, including the proposed requirement for a 'kill switch' to deactivate AI models. Y Combinator claims this would impact the development of open-source AI.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/06/24/google-is-bringing-gemini-access-to-teens-using-their-school-accounts/">Google is bringing Gemini access to teens using their school accounts</a></strong></p><p>Google is expanding access to Gemini to teen students using their school accounts. This move aims to prepare students for a future where generative AI is prevalent. Gemini provides real-time feedback to help students learn more confidently. Google assures that it will not use data from student chats to train its AI models and has implemented guardrails to prevent inappropriate responses. The company recommends that teens use the double-check feature to develop critical thinking skills. Gemini will be available in English in over 100 countries and will be off by default for teens until administrators enable it. Additionally, Google is launching the Read Along in Classroom feature globally, which helps students improve reading skills with real-time support. Educators will have access to tools for creating, managing, and sharing interactive lessons, as well as the ability to mark assignments and perform bulk scoring actions.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/06/25/stability-ai-lands-a-lifeline-from-sean-parker-greycroft/">Stability AI lands a lifeline from Sean Parker, Greycroft</a></strong></p><p>Stability AI has secured new funding from investors including Sean Parker, Greycroft, and Coatue Management. The company has not disclosed the amount raised. Stability AI has faced financial difficulties, with unpaid cloud bills and a lower valuation for the startup. However, the new investors have committed $80 million to take over Stability and have negotiated debt forgiveness and the release from future obligations. Stability AI plans to focus on growing its managed image, video, and audio pipelines, building custom enterprise models and content production tools, and delivering APIs for consumer apps. The company aims to remain committed to open-source principles and continue developing powerful generative AI models.</p><p><strong><a href="https://techcrunch.com/2024/06/24/openai-buys-a-remote-collaboration-platform/">OpenAI buys a remote collaboration platform</a></strong></p><p>OpenAI has acquired Multi, a startup that developed a video-first collaboration platform for remote teams. The deal is primarily an acqui-hire, with most of Multi's team joining OpenAI. Multi offered features such as collaborative screen sharing, customizable shortcuts, and automatic deep links for code, designs, and documents. This acquisition aligns with OpenAI's strategy of investing in enterprise solutions, as they have been focusing on developing AI-powered tools for businesses. OpenAI's revenue is projected to exceed $3.4 billion this year. The acquisition of Multi suggests that OpenAI may expand its offerings to include videoconferencing and remote collaboration capabilities in the future.</p><p><strong><a href="https://www.theverge.com/2024/6/27/24187151/youtube-ai-music-deals-licensing-record-labels-sony-umg-warner">YouTube is trying to make AI music deals with major record labels</a></strong></p><p>YouTube is seeking to make deals with major record labels, including Universal Music Group, Sony Music Entertainment, and Warner Records, to license their songs and train its AI music tools. The platform aims to use the licensed music to develop new AI tools that will be launched later this year. YouTube has not disclosed the exact fee it is willing to pay for these licenses, but it is expected to be a one-time payment rather than a royalty-based arrangement. However, convincing both artists and labels may be a challenge, as Sony Music has warned against unauthorized use of its content, and UMG has previously pulled its music catalog from TikTok due to concerns about AI-generated music. In January, over 200 artists called for tech companies to stop using AI to infringe upon the rights of human artists.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/06/26/operas-new-version-of-the-browser-gets-better-multimedia-controls-and-ai-powered-image-generation/">Opera&#8217;s browser adds AI-powered image generation and better multimedia controls</a></strong></p><p>Opera is releasing the second version of Opera One in developer beta, featuring new multimedia controls, split tabs, and AI capabilities. The new version allows users to have floating multimedia controls that can be resized and match the browser's theme. It also introduces split tab windows, enabling users to work on two web pages simultaneously. Additionally, Opera One R2 includes AI features such as a page context mode that allows users to ask questions about a web page and AI-powered summarization. Opera plans to release Opera One R2 to a wider user base later this year.</p><h3>Papers</h3><p>Once again, a list:</p><ul><li><p><a href="https://arxiv.org/abs/2406.11813">How Do LLMs Acquire Factual Knowledge During Pretraining?</a></p></li><li><p><a href="https://arxiv.org/abs/2406.11717">Refusal in LLMs is Mediated by a Single Direction</a></p></li><li><p><a href="https://arxiv.org/abs/2406.11944">Transcoders Find Interpretable LLM Feature Circuits</a></p></li><li><p><a href="https://arxiv.org/abs/2309.15084">The Surveillance AI Pipeline</a></p></li><li><p><a href="https://arxiv.org/abs/2406.11463">Just How Flexible are Neural Networks in Practice?</a></p></li><li><p><a href="https://arxiv.org/abs/2406.03689">Evaluating the World Model Implicit in a Generative Model</a></p></li><li><p><a href="https://yihuaihong.github.io/ConceptVectors.github.io/">Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces</a></p></li><li><p><a href="https://arxiv.org/abs/2406.14546">Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data</a></p></li><li><p><a href="https://www.nature.com/articles/s41586-024-07421-0">Detecting Hallucinations in LLMs Using Semantic Entropy</a></p></li><li><p><a href="https://www.biorxiv.org/content/10.1101/2024.06.21.599332v1">Linguistic inputs must be syntactically parsable to fully engage the language network</a></p></li></ul><h3>Closing Thoughts</h3><p>Have something to say about this edition&#8217;s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at <a href="mailto:dbashir@hmc.edu">dbashir@hmc.edu</a> or on <a href="https://twitter.com/spaniel_bashir">Twitter</a>. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!</p>]]></content:encoded></item><item><title><![CDATA[Mini-Update #43: OpenAI buys Rockset and Diffusion on Syntax Trees for Code Generation]]></title><description><![CDATA[OpenAI buys Rockset in a bid to enhance its retrieval capabilities and researchers train a diffusion model to write code to produce images]]></description><link>https://thegradientpub.substack.com/p/mini-update-43-openai-buys-rockset</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/mini-update-43-openai-buys-rockset</guid><dc:creator><![CDATA[Ather Fawaz]]></dc:creator><pubDate>Wed, 26 Jun 2024 04:47:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rg_R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583ff9e2-5afd-407c-89fc-827ea8b68a19_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 43nd Mini-Update from the Gradient! This is our exclusive newsletter edition specifically for paying subscribers and is our way to show you our appreciation for your support.</p>
      <p>
          <a href="https://thegradientpub.substack.com/p/mini-update-43-openai-buys-rockset">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Update #77: AI-Generated News Cannibalizes Journalism and Open-Endedness for Superintelligence]]></title><description><![CDATA[BNN Breaking turns out to be an AI chop shop; DeepMind argues that open-endedness is essential for "superintelligent" systems.]]></description><link>https://thegradientpub.substack.com/p/update-77-ai-generated-news-open-endedness-asi</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/update-77-ai-generated-news-open-endedness-asi</guid><dc:creator><![CDATA[Cole Frank]]></dc:creator><pubDate>Tue, 18 Jun 2024 15:30:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1F6O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 77th update from the Gradient! If you&#8217;re new and like what you see, <a href="https://thegradientpub.substack.com/">subscribe</a> and follow us on <a href="https://twitter.com/gradientpub">Twitter</a>. <strong>Our newsletters run long, so you&#8217;ll need to view this post on Substack to see everything!</strong></p><h2>Editor Notes</h2><p>Good morning.&nbsp;</p><p>Leopold Aschenbrenner is ubiquitous and OpenAI has a deal with Apple.&nbsp;</p><p>Our highlights today have absolutely nothing to do with either of these items.&nbsp;</p><p>A few great pieces from around town:</p><ul><li><p><a href="https://theconvivialsociety.substack.com/p/the-work-of-art">The Work of Art</a> (<span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;L. M. Sacasas&quot;,&quot;id&quot;:1810437,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7fdbf22f-2893-4ad5-b729-d644f8563ba2_614x614.png&quot;,&quot;uuid&quot;:&quot;be7dbc9c-1151-48d1-85e7-186d03bb29f6&quot;}" data-component-name="MentionToDOM"></span>)</p></li><li><p><a href="https://www.thealgorithmicbridge.com/p/girlfriends-inc">Girlfriends, Inc.</a> (<span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Alberto Romero&quot;,&quot;id&quot;:91075008,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/31a52896-6fd1-4891-8206-b4744c15278f_2626x2219.jpeg&quot;,&quot;uuid&quot;:&quot;12248eca-6361-4194-8276-3e1b67a1f88b&quot;}" data-component-name="MentionToDOM"></span>) </p></li><li><p><a href="https://nickpotkalitsky.substack.com/p/its-time-to-listen-to-teachers-about">It&#8217;s Time to Listen to Teachers About AI!</a> (<span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Nick Potkalitsky&quot;,&quot;id&quot;:156304717,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0bdf436-1c38-490d-bc78-c51e25ed1e05_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;c08cc9db-7540-4da6-9ad4-531b9023d554&quot;}" data-component-name="MentionToDOM"></span>) </p></li></ul><p>As always, if you want to write with us, send a pitch using <a href="https://goo.gl/forms/whYRKEzMZJox6FaH2">this form</a>.</p><h2><strong>News Highlight</strong>: AI-Paraphrased News Cannibalizes Original Journalism</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1F6O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1F6O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png 424w, https://substackcdn.com/image/fetch/$s_!1F6O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png 848w, https://substackcdn.com/image/fetch/$s_!1F6O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png 1272w, https://substackcdn.com/image/fetch/$s_!1F6O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1F6O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png" width="410" height="209.18367346938774" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:350,&quot;width&quot;:686,&quot;resizeWidth&quot;:410,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1F6O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png 424w, https://substackcdn.com/image/fetch/$s_!1F6O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png 848w, https://substackcdn.com/image/fetch/$s_!1F6O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png 1272w, https://substackcdn.com/image/fetch/$s_!1F6O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff0b15b1-ac78-4453-b4de-1cadfae8cc1b_686x350.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p><a href="https://en.wikipedia.org/wiki/BNN_Breaking">BNN Breaking</a>, a site that churned out AI-paraphrased versions of news articles, shut down after two years as numerous complaints about its content stealing and fabrication came to light. The dubious news outlet had a licensing agreement with Microsoft, which owns MSN. Its articles were also picked up by prominent Western outlets such as <em>The Washington Post</em> and <em>The Guardian</em> and were frequently surfaced on Google News. Individuals suffered reputation damage due to the falsehoods BNN published while local journalism organizations lost already scarce revenue to BNN's plagiarism.</p><h4><strong>Overview</strong></h4><p>As detailed in a recent <em>New York Times</em> article, BNN Breaking, which looked like a reliable news site, was in reality an <a href="https://www.nytimes.com/2024/06/06/technology/bnn-breaking-ai-generated-news.html">AI chop shop</a>. It was founded in Hong Kong by Gurbaksh Chahal, a serial entrepreneur who had a criminal history of domestic violence, and employed dozens of freelance journalists based in countries like Pakistan, Egypt, and Nigeria. Employees uploaded articles from other news sites to a generative AI tool and were asked to manually "validate" the paraphrased results. Eventually, the tool churned out hundreds or even thousands of stories a day and randomly assigned journalists' bylines.</p><p>One of the BNN articles promoted on MSN.com falsely associated Irish DJ and talk-show host Dave Fanning, whose photo was featured, with a different Irish broadcaster accused of sexual misconduct. The name of the broadcaster facing trial was not released in the original article and the AI "presumably paired the text with a generic photo of a 'prominent Irish broadcaster'", according to the India-based journalist whose byline was on the story. Fanning filed a defamation lawsuit against Microsoft and BNN Breaking.</p><p>BNN's operations also harm smaller, local news outlets who already struggle to get a share of advertising dollars to support their work. For example, its AI-rewritten version of an original article published by South African outlet Limpopo Mirror <a href="https://groundup.org.za/article/how-google-facebook-killing-local-news/">ranked higher</a> in Google search than the genuine version. Only after Google rolled out an <a href="https://blog.google/products/search/google-search-update-march-2024/">update</a> targeting "spammy" sites in March did BNN's stories stop showing up in search results.</p><h4><strong>Our Take</strong></h4><p>I was quite shocked that a source that seems so obviously suspicious (perhaps only in hindsight) gained as much traction as it did and lasted as long as two years. The NYT article mentions that content curation on MSN.com was increasingly carried out by<a href="https://www.cnn.com/2023/11/02/tech/microsoft-ai-news/index.html"> AI instead of human editors</a>, and I can't help thinking about the layers of algorithmic decision making that shapes what we see on a daily basis, from what is surfaced in search to the actual content. Does AI have a bias for AI-generated content?</p><p>What makes BNN intriguing, and what I think partially contributed to its success before its downfall, was that real human journalists were involved, and at least some of them bought into the promise of a "revolution in the journalism industry" before they realized their own reputation and integrity were being jeopardized. At the same time, BNN's workforce &#8212; a scattered network of relatively inexperienced journalists based in developing countries &#8212; is reminiscent of the kind of labor exploitation familiar to tech. How big is the impact of AI<a href="https://www.cjr.org/analysis/as-election-looms-a-network-of-mysterious-pink-slime-local-news-outlets-nearly-triples-in-size.php"> pink slime</a> and AI-generated news that appear to be of high quality, in a world where human writers are<a href="https://gizmodo.com/ai-detectors-inaccurate-freelance-writers-fired-1851529820"> wrongly accused</a> of relying on AI and are being replaced by AI anyway? I'd like to think we as consumers of information all have <em>something </em>within our control that we can do in the service of truth.</p><p>&#8211; Jaymee</p><p>I want to echo and spend a minute on Jaymee&#8217;s last point: that we have <em>something</em> within our control that we can do in the service of truth, even as consumers. I think often of L.M. Sacasas&#8217;s point&#8212;made by others as well, but especially well by him&#8212;that we can conflate what feels technologically &#8220;advanced&#8221; for what is best for human beings as we are. This manifests in different ways, and BNN provides perhaps one example of when we believe too deeply in the logic of inevitability. News organizations have struggled with these questions and <a href="https://www.axios.com/2023/08/22/ai-rules-newsrooms-training-data">laid</a> <a href="https://www.niemanlab.org/2023/07/writing-guidelines-for-the-role-of-ai-in-your-newsroom-here-are-some-er-guidelines-for-that/">out</a> <a href="https://journalistsresource.org/home/generative-ai-policies-newsrooms/">guidelines</a> and standards for the use of AI in the newsroom, but there clearly isn&#8217;t something like &#8220;societal agreement&#8221; on what&#8217;s acceptable and good for us. Will we ever make different decisions?&nbsp;</p><p>&#8212;Daniel</p><h2><strong>Research Highlight: </strong><a href="https://arxiv.org/pdf/2406.04268">Open-Endedness is Essential for Artificial SuperIntelligence</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GP7x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a9acb-7b14-483f-85f6-35d22d657b5c_680x516.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GP7x!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a9acb-7b14-483f-85f6-35d22d657b5c_680x516.png 424w, https://substackcdn.com/image/fetch/$s_!GP7x!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a9acb-7b14-483f-85f6-35d22d657b5c_680x516.png 848w, https://substackcdn.com/image/fetch/$s_!GP7x!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a9acb-7b14-483f-85f6-35d22d657b5c_680x516.png 1272w, https://substackcdn.com/image/fetch/$s_!GP7x!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a9acb-7b14-483f-85f6-35d22d657b5c_680x516.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GP7x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a9acb-7b14-483f-85f6-35d22d657b5c_680x516.png" width="506" height="383.9647058823529" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d35a9acb-7b14-483f-85f6-35d22d657b5c_680x516.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:516,&quot;width&quot;:680,&quot;resizeWidth&quot;:506,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GP7x!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a9acb-7b14-483f-85f6-35d22d657b5c_680x516.png 424w, https://substackcdn.com/image/fetch/$s_!GP7x!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a9acb-7b14-483f-85f6-35d22d657b5c_680x516.png 848w, https://substackcdn.com/image/fetch/$s_!GP7x!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a9acb-7b14-483f-85f6-35d22d657b5c_680x516.png 1272w, https://substackcdn.com/image/fetch/$s_!GP7x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a9acb-7b14-483f-85f6-35d22d657b5c_680x516.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>This position paper out of Google DeepMind formalizes the concept of an &#8220;open-ended&#8221; AI system and argues that achieving Artificial Superhuman Intelligence (ASI) will require the development of such systems. The authors propose an observer-dependent and temporal definition that a system is &#8220;open-ended&#8221; so long as the sequence of artifacts it produces is both <em>learnable</em> and <em>novel</em> to the observer. Learnable artifacts are those that make future artifacts more predictable and novel artifacts are those that are unpredictable given previously generated artifacts. Novelty guarantees information gain while learnability ensures that the information gain is in some sense &#8220;meaningful&#8221; and that the artifact is interesting to the observer. They argue that we have yet to develop any general &#8220;open-ended&#8221; systems, describe how their framework maps to existing SoTA AI systems, and offer four overlapping paths towards open-ended foundation models.</p><h4><strong>Overview</strong>&nbsp;</h4><p>The authors begin by claiming that because:</p><ol><li><p>&#8220;open-ended invention is the mechanism by which human individuals and society at large accumulates new knowledge and technology&#8221; and,</p></li><li><p>&nbsp;an ASI could, by definition, perform a wide range of tasks at a level beyond any human&#8217;s capability,</p></li></ol><p>It must follow that open-endedness is an essential property for ASI. Here &#8220;essential&#8221; means something along the lines of necessary but not sufficient. They describe an open-ended system as &#8220;an autonomous system [that] self-improves towards increasingly creative and diverse discoveries <em>without end.</em>&#8221;</p><p>In their view, scaling foundation models alone won&#8217;t get us open-endedness. They note the limited supply of high-quality textual and visual data, and conjecture that &#8220;the trend of improving foundation models trained on passive data by scaling alone will soon plateau&#8221;. Something more targeted towards open-endedness is needed to discover new knowledge. &#8220;Open-ended&#8221; is pretty hand-wavy, so in order to catalyze research into open-endedness by providing clarity, they define an open-ended system as one that produces novel and learnable artifacts, then formalize these concepts in the following way:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!abOQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374ab84d-10ce-47d6-852e-be947b1c4c82_656x196.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!abOQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374ab84d-10ce-47d6-852e-be947b1c4c82_656x196.png 424w, https://substackcdn.com/image/fetch/$s_!abOQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374ab84d-10ce-47d6-852e-be947b1c4c82_656x196.png 848w, https://substackcdn.com/image/fetch/$s_!abOQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374ab84d-10ce-47d6-852e-be947b1c4c82_656x196.png 1272w, https://substackcdn.com/image/fetch/$s_!abOQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374ab84d-10ce-47d6-852e-be947b1c4c82_656x196.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!abOQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374ab84d-10ce-47d6-852e-be947b1c4c82_656x196.png" width="482" height="144.0121951219512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/374ab84d-10ce-47d6-852e-be947b1c4c82_656x196.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:196,&quot;width&quot;:656,&quot;resizeWidth&quot;:482,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!abOQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374ab84d-10ce-47d6-852e-be947b1c4c82_656x196.png 424w, https://substackcdn.com/image/fetch/$s_!abOQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374ab84d-10ce-47d6-852e-be947b1c4c82_656x196.png 848w, https://substackcdn.com/image/fetch/$s_!abOQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374ab84d-10ce-47d6-852e-be947b1c4c82_656x196.png 1272w, https://substackcdn.com/image/fetch/$s_!abOQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374ab84d-10ce-47d6-852e-be947b1c4c82_656x196.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>With this setup, they define the system as <strong>novel </strong>if the artifacts being generated become increasingly unpredictable with respect to the observer&#8217;s model at some fixed time <em>t</em>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\forall t, \\forall T > t, \\exists T' > T: \\mathbb{E}\\left[\\ell (t, T')\\right] > \\mathbb{E}\\left[\\ell (t, T)\\right]\\,.&quot;,&quot;id&quot;:&quot;XIACOBSONQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>And the system is <strong>learnable</strong> if conditioning on a longer history makes future artifacts more predictable in the sense of a lower loss value:</p><p></p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\forall T, \\forall t < T, \\forall T > t ' > t: \\mathbb{E}\\left[\\ell (t', T)\\right] < \\mathbb{E}\\left[\\ell (t, T)\\right]\\,.&quot;,&quot;id&quot;:&quot;XXLBNCFNZP&quot;}" data-component-name="LatexBlockToDOM"></div><p></p><p>Novelty and learnability are necessary in tandem: without novelty, the observer is able to predict all the artifacts and there is no new information being gained; without learnability, the observer can&#8217;t make tractable process (the authors give the example of a TV-set that generates white noise&#8212;endlessly novel in the sense of the unpredictability of its aleatoric uncertainty but not interesting or learnable in any sense.&nbsp;</p><p>This definition is very general by design. Leaving the loss metric $\ell(t,T)$ undefined allows leeway in measuring the &#8220;interestingness&#8221; of artifacts. And by incorporating time and making novelty and learnability observer-dependent, this definition allows for the possibility of systems that are only temporarily open-ended or only open-ended to certain observers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hGhE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef5eec8b-e68d-47e9-836c-aed652f65ea4_650x830.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hGhE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef5eec8b-e68d-47e9-836c-aed652f65ea4_650x830.png 424w, https://substackcdn.com/image/fetch/$s_!hGhE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef5eec8b-e68d-47e9-836c-aed652f65ea4_650x830.png 848w, https://substackcdn.com/image/fetch/$s_!hGhE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef5eec8b-e68d-47e9-836c-aed652f65ea4_650x830.png 1272w, https://substackcdn.com/image/fetch/$s_!hGhE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef5eec8b-e68d-47e9-836c-aed652f65ea4_650x830.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hGhE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef5eec8b-e68d-47e9-836c-aed652f65ea4_650x830.png" width="650" height="830" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef5eec8b-e68d-47e9-836c-aed652f65ea4_650x830.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:830,&quot;width&quot;:650,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hGhE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef5eec8b-e68d-47e9-836c-aed652f65ea4_650x830.png 424w, https://substackcdn.com/image/fetch/$s_!hGhE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef5eec8b-e68d-47e9-836c-aed652f65ea4_650x830.png 848w, https://substackcdn.com/image/fetch/$s_!hGhE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef5eec8b-e68d-47e9-836c-aed652f65ea4_650x830.png 1272w, https://substackcdn.com/image/fetch/$s_!hGhE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef5eec8b-e68d-47e9-836c-aed652f65ea4_650x830.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The authors relate how their definition of open-endedness maps to existing AI systems. They highlight three archetypal AI systems that do exhibit open-endedness but only in a narrow domain:</p><ol><li><p>Systems that augment RL with self-play (e.g. AlphaGo, AlphaGo and similar systems for StarCraft, Stratego, DotA, and Diplomacy)</p></li><li><p>Systems that use unsupervised environment design (UED) to establish an automatic curriculum (e.g. AdA, a large-scale agent capable of solving tasks in a 3d environment)</p></li><li><p>Systems that employ evolutionary algorithms to train agents (e.g. Paired Open-Ended Trailblazing (<a href="https://arxiv.org/abs/1901.01753">POET</a>), which trains a population of agents across an evolving range of environments)</p></li></ol><p>And what about current SoTA foundation models? The authors classify these as general but not open-ended. They argue that since these models are trained on fixed datasets, it follows that if the distribution of this data is learnable then these models cannot be endlessly novel. Further the distribution of this data <em>must </em>be learnable because the foundation model learned it in the first place!</p><p>But all hope is not lost for foundation models&#8212;the authors contend that augmenting foundation models with open-endedness is a viable route toward ASI. They offer four overlapping paths to such open-ended foundation models:</p><ol><li><p>RL: RL is fundamental to the existing narrow open-ended AI systems and there is already promising research into using RL on top of foundation models. In particular foundation models could overcome the difficulty of guiding RL algos in high-dimensional domains by serving as a sort of proxy observer (e.g. <a href="https://arxiv.org/abs/2305.16291">Voyager</a>, <a href="https://arxiv.org/abs/2310.00166">Motif</a>, <a href="https://arxiv.org/abs/2306.01711">OMNI</a>)</p></li><li><p>Self-Improvement: self-improvement loops wherein the model critiques itself that hopefully allow the system to &#8220;generate new knowledge&#8230;beyond the human curated training data&#8221; (e.g.&nbsp; Anthropic&#8217;s <a href="https://arxiv.org/pdf/2212.08073">Constitutional AI</a>, OpenAI&#8217;s <a href="https://arxiv.org/pdf/2206.05802">work on self-critiquing models</a>, <a href="https://arxiv.org/pdf/2212.10560">SELF-INSTRUCT</a>)</p></li><li><p>Task generation: &#8220;adapting the difficulty of tasks to an agent&#8217;s capability so that they remain forever challenging yet learnable&#8221; (e.g. <a href="https://openreview.net/forum?id=rmiwIL98uQ">WebArena</a>, <a href="https://arxiv.org/abs/2310.06114">Interactive Real World Simulators</a>)</p></li><li><p>Evolutionary Algorithms: foundation models can be used to generate selection and mutation operators (e.g. <a href="https://arxiv.org/abs/2206.08896">ELM</a>, <a href="https://arxiv.org/abs/2401.10034">2024 survey paper of evolutionary algos for LLMs</a>)</p></li></ol><h4><strong>Our Take</strong></h4><p>Generative AI&#8212;in particular open-ended GenAI or GenAI for discovery&#8212;surfaces a fascinating tension between novel outputs and coherent outputs. I think this tension tracks well with the dual requirements of novelty and learnability introduced in this paper. In some sense it feels misguided to expect novelty from a system that&#8217;s trained with the sole objective of minimizing <a href="https://en.wikipedia.org/wiki/Entropy_(information_theory)">surprise</a>. A good illustration of this tension is decoding open-ended text from an LLM. Simply generating the most-likely next token or using some sort of greedy beam search is a bad strategy! It tends to yield uninteresting and repetitive outputs. I love this plot from <a href="https://arxiv.org/pdf/1904.09751">Holtzman et al. 2020</a> comparing beam search next-token probabilities to the probabilities the same LLM assigns to a human-generated text (one that it hasn&#8217;t seen during training):</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7BkJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3781ae7d-a2f1-43ec-a1ea-93cae4232f40_752x1090.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7BkJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3781ae7d-a2f1-43ec-a1ea-93cae4232f40_752x1090.png 424w, https://substackcdn.com/image/fetch/$s_!7BkJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3781ae7d-a2f1-43ec-a1ea-93cae4232f40_752x1090.png 848w, https://substackcdn.com/image/fetch/$s_!7BkJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3781ae7d-a2f1-43ec-a1ea-93cae4232f40_752x1090.png 1272w, https://substackcdn.com/image/fetch/$s_!7BkJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3781ae7d-a2f1-43ec-a1ea-93cae4232f40_752x1090.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7BkJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3781ae7d-a2f1-43ec-a1ea-93cae4232f40_752x1090.png" width="382" height="553.6968085106383" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3781ae7d-a2f1-43ec-a1ea-93cae4232f40_752x1090.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1090,&quot;width&quot;:752,&quot;resizeWidth&quot;:382,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7BkJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3781ae7d-a2f1-43ec-a1ea-93cae4232f40_752x1090.png 424w, https://substackcdn.com/image/fetch/$s_!7BkJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3781ae7d-a2f1-43ec-a1ea-93cae4232f40_752x1090.png 848w, https://substackcdn.com/image/fetch/$s_!7BkJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3781ae7d-a2f1-43ec-a1ea-93cae4232f40_752x1090.png 1272w, https://substackcdn.com/image/fetch/$s_!7BkJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3781ae7d-a2f1-43ec-a1ea-93cae4232f40_752x1090.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The beam search probabilities are high by design&#8212;what&#8217;s&#8230;surprising&#8230;is how much the probabilities of the human-generated text jump around. One solution is to introduce some randomness into the text generation: rather than generating the most-likely next token, sample from the next-token distribution. But pure sampling has a tendency to go off the rails resulting in incoherent (if not also beautiful and poetic) gibberish:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M2-S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6d5db-7504-4003-b570-2ccae0b23a22_736x294.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M2-S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6d5db-7504-4003-b570-2ccae0b23a22_736x294.png 424w, https://substackcdn.com/image/fetch/$s_!M2-S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6d5db-7504-4003-b570-2ccae0b23a22_736x294.png 848w, https://substackcdn.com/image/fetch/$s_!M2-S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6d5db-7504-4003-b570-2ccae0b23a22_736x294.png 1272w, https://substackcdn.com/image/fetch/$s_!M2-S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6d5db-7504-4003-b570-2ccae0b23a22_736x294.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M2-S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6d5db-7504-4003-b570-2ccae0b23a22_736x294.png" width="502" height="200.52717391304347" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5f6d5db-7504-4003-b570-2ccae0b23a22_736x294.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:294,&quot;width&quot;:736,&quot;resizeWidth&quot;:502,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M2-S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6d5db-7504-4003-b570-2ccae0b23a22_736x294.png 424w, https://substackcdn.com/image/fetch/$s_!M2-S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6d5db-7504-4003-b570-2ccae0b23a22_736x294.png 848w, https://substackcdn.com/image/fetch/$s_!M2-S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6d5db-7504-4003-b570-2ccae0b23a22_736x294.png 1272w, https://substackcdn.com/image/fetch/$s_!M2-S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6d5db-7504-4003-b570-2ccae0b23a22_736x294.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Sampling promotes novelty but too much of it harms learnability. Common alternatives to pure sampling like top-k sampling or nucleus sampling aim for a middle ground by lopping off the tail end of the probability distribution before sampling. This avoids generating very unlikely tokens and helps keep the LLM on track while also allowing for diverse outputs. But it feels sort of hacky. <a href="https://twitter.com/teortaxesTex/status/1802128370861232374">More recently</a>, the cutting edge in generation involves incorporating some sort of Monte Carlo tree search over outputs&#8212;often optimizing over some metric other than probability.</p><p>Open-ended natural language generation is particularly tricky because the success criteria are not very clearly defined. Domains like games, protein folding, or geometry are more tractable because success criteria are clearer and valid outputs can be verified automatically. Successful systems like AlphaFold and AlphaGeometry combine an LLM architecture for generating ideas with some sort of &#8220;<a href="https://x.com/rao2z/status/1735805022905245991">external verifier</a>&#8221; (e.g. AlphaFold&#8217;s structure module that enforces stereochemical constraints or AlphaGeometry&#8217;s rule bound deduction engine).</p><p>Interestingly, the authors of this open-endedness paper more or less elide the foundation model + external verifier paradigm. Neither AlphaFold nor AlphaGeometry is mentioned or cited in the paper (although one of the authors references AlphaFold in a <a href="https://x.com/edwardfhughes/status/1799087325928186070">tweet</a> about this paper as revolutionary but not fully general). Instead, they emphasize a foundation-models-as- &#8220;proxy observer&#8221;-for-a-RL-or-evolutionary-search-algorithm paradigm. There are though <a href="https://x.com/rao2z/status/1715800819239678013">good reasons</a> to be skeptical about the use of foundation models as verifiers.</p><p>&#8211; Cole</p><p>I want to at least mention this figure from David Pfau (<a href="https://x.com/pfau/status/1781854406507442461">source</a>):</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aHpz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a79bd9-e030-4b5f-b6a3-a5d6cd67c694_1070x498.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aHpz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a79bd9-e030-4b5f-b6a3-a5d6cd67c694_1070x498.png 424w, https://substackcdn.com/image/fetch/$s_!aHpz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a79bd9-e030-4b5f-b6a3-a5d6cd67c694_1070x498.png 848w, https://substackcdn.com/image/fetch/$s_!aHpz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a79bd9-e030-4b5f-b6a3-a5d6cd67c694_1070x498.png 1272w, https://substackcdn.com/image/fetch/$s_!aHpz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a79bd9-e030-4b5f-b6a3-a5d6cd67c694_1070x498.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aHpz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a79bd9-e030-4b5f-b6a3-a5d6cd67c694_1070x498.png" width="1070" height="498" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68a79bd9-e030-4b5f-b6a3-a5d6cd67c694_1070x498.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:498,&quot;width&quot;:1070,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aHpz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a79bd9-e030-4b5f-b6a3-a5d6cd67c694_1070x498.png 424w, https://substackcdn.com/image/fetch/$s_!aHpz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a79bd9-e030-4b5f-b6a3-a5d6cd67c694_1070x498.png 848w, https://substackcdn.com/image/fetch/$s_!aHpz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a79bd9-e030-4b5f-b6a3-a5d6cd67c694_1070x498.png 1272w, https://substackcdn.com/image/fetch/$s_!aHpz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a79bd9-e030-4b5f-b6a3-a5d6cd67c694_1070x498.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The point being to note that open-endedness&#8212;as the authors see it&#8212;doesn&#8217;t seem to fit neatly into the right side of Pfau&#8217;s diagram. We can confirm this by way of the examples the authors provide for open-endedness: AlphaGo (deep RL) is a positive example in that it produces policies that are novel to human expert players; contemporary foundation models are a <em>negative</em> example (the distribution of their data is learnable, and cannot be endlessly novel).&nbsp;</p><p>Anyway, I&#8217;m still coming to conclusions on all this and would love to hear thoughts.&nbsp;</p><p>&#8212;Daniel</p><h2>New from the Gradient</h2><h3><a href="https://thegradientpub.substack.com/p/c-thi-nguyen-values-legibility-games">C. Thi Nguyen: Values, Legibility, and Gamification</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DRsh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56966816-121d-4ae1-8fc6-d5abf25cd8b5_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DRsh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56966816-121d-4ae1-8fc6-d5abf25cd8b5_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!DRsh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56966816-121d-4ae1-8fc6-d5abf25cd8b5_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!DRsh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56966816-121d-4ae1-8fc6-d5abf25cd8b5_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!DRsh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56966816-121d-4ae1-8fc6-d5abf25cd8b5_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DRsh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56966816-121d-4ae1-8fc6-d5abf25cd8b5_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/56966816-121d-4ae1-8fc6-d5abf25cd8b5_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DRsh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56966816-121d-4ae1-8fc6-d5abf25cd8b5_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!DRsh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56966816-121d-4ae1-8fc6-d5abf25cd8b5_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!DRsh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56966816-121d-4ae1-8fc6-d5abf25cd8b5_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!DRsh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56966816-121d-4ae1-8fc6-d5abf25cd8b5_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/c-thi-nguyen-values-legibility-games&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/c-thi-nguyen-values-legibility-games"><span>Listen</span></a></p><h3><a href="https://thegradientpub.substack.com/p/vivek-natarajan-biomedical-ai-healthcare-palm">Vivek Natarajan: Towards Biomedical AI</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y9OC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2961fdcd-f9aa-46da-b103-b275e25eaf31_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y9OC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2961fdcd-f9aa-46da-b103-b275e25eaf31_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!Y9OC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2961fdcd-f9aa-46da-b103-b275e25eaf31_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!Y9OC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2961fdcd-f9aa-46da-b103-b275e25eaf31_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Y9OC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2961fdcd-f9aa-46da-b103-b275e25eaf31_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y9OC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2961fdcd-f9aa-46da-b103-b275e25eaf31_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2961fdcd-f9aa-46da-b103-b275e25eaf31_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y9OC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2961fdcd-f9aa-46da-b103-b275e25eaf31_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!Y9OC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2961fdcd-f9aa-46da-b103-b275e25eaf31_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!Y9OC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2961fdcd-f9aa-46da-b103-b275e25eaf31_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Y9OC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2961fdcd-f9aa-46da-b103-b275e25eaf31_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/vivek-natarajan-biomedical-ai-healthcare-palm&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/vivek-natarajan-biomedical-ai-healthcare-palm"><span>Listen</span></a></p><h2>Other Things That Caught Our Eyes</h2><h3>News</h3><p><strong><a href="https://www.nytimes.com/2024/06/10/technology/california-ai-regulation.html">States Take Up A.I. Regulation Amid Federal Standstill</a></strong></p><p>Lawmakers in California are pushing for new regulations on AI to protect consumers and jobs. The proposed bills aim to impose the strictest restrictions on AI in the nation, addressing concerns such as job displacement, disinformation in elections, and national security risks. The measures include rules to prevent AI tools from discriminating in housing and healthcare services, as well as protecting intellectual property and jobs. With the federal government slow to act on AI regulation, state legislators are taking the lead, with California proposing the most bills at 50. These state regulations could set a precedent for the entire country.&nbsp;</p><p><strong><a href="https://www.nytimes.com/2024/06/11/style/ai-search-slop.html">First Came &#8216;Spam.&#8217; Now, With A.I., We&#8217;ve Got &#8216;Slop&#8217;</a></strong></p><p>"Slop" is a term used to describe shoddy or unwanted AI content in various online platforms, including social media, art, books, and search results&#8212;it refers to inaccurate or irrelevant suggestions or information generated by AI systems. For example, when Google suggests adding nontoxic glue to make cheese stick to a pizza or when a digital book seems similar to what you were looking for but falls short. The term gained more attention when Google incorporated Gemini into its US-based search results, where it attempted to provide an "AI Overview" at the top of the results page.&nbsp;</p><p><strong><a href="https://www.nytimes.com/2024/06/05/technology/nvidia-microsoft-openai-antitrust-doj-ftc.html">U.S. Clears Way for Antitrust Inquiries of Nvidia, Microsoft and OpenAI</a></strong></p><p>The US Justice Department and Federal Trade Commission have reached a deal to proceed with antitrust investigations into the dominant roles of Microsoft, OpenAI, and Nvidia in the AI industry. The Justice Department will investigate Nvidia, the largest maker of AI chips, for potential antitrust violations, while the FTC will examine the conduct of OpenAI, the creator of the ChatGPT chatbot, and Microsoft, which has invested heavily in OpenAI. This agreement reflects the increasing regulatory scrutiny of AI technology and its potential impact on jobs and society. The Justice Department and FTC have been actively working to curb the power of major tech companies, and this investigation follows similar actions taken against Google, Apple, Amazon, and Meta.</p><p><strong><a href="https://www.theverge.com/2024/6/6/24172868/ftc-doj-antitrust-openai-microsoft-nvidia-investigations">FTC and DOJ reportedly opening antitrust investigations into Microsoft, OpenAI, and Nvidia</a></strong></p><p>The Federal Trade Commission (FTC) and the Department of Justice (DOJ) have reportedly agreed to split duties in investigating potential antitrust violations by Microsoft, OpenAI, and Nvidia. The DOJ will lead inquiries into Nvidia, while the FTC will focus on the deal between OpenAI and Microsoft. The FTC has been looking into antitrust issues related to investments made by tech companies into smaller AI firms since January. They have already opened an inquiry into OpenAI's data collection practices. The European Commission and the UK's Competition and Markets Authority are separately investigating Microsoft's investment in OpenAI. Nvidia has not been part of any antitrust conversations in the US until now. The investigation does not necessarily mean that the Biden administration is opening cases against these companies, but it is worth noting that a similar deal in 2019 led to cases against Google, Apple, Amazon, and Meta.</p><p><strong><a href="https://www.defensenews.com/artificial-intelligence/2024/06/10/air-force-space-force-unveil-tool-for-ai-experimentation/">Air Force, Space Force unveil tool for AI experimentation</a></strong></p><p>The Air Force and Space Force have introduced a generative AI tool called the Non-classified Internet Protocol Generative Pre-training Transformer (NIPRGPT) to encourage airmen and guardians to experiment with AI technology. The tool aims to improve access to information and determine the demand for AI capabilities within the force. The Defense Department has been exploring the use of generative AI tools to enhance efficiency in daily tasks. The Air Force Research Laboratory (AFRL) developed NIPRGPT using publicly-available AI models and will work with commercial partners to test and integrate their tools. The tool will help determine the best approach for acquiring AI capabilities based on user feedback and demand.&nbsp;</p><p><strong><a href="https://www.theverge.com/2024/6/10/24175416/adobe-overhauls-terms-of-service-update-firefly">Adobe overhauls terms of service to say it won&#8217;t train AI on customers&#8217; work</a></strong></p><p>Adobe is updating its terms of service to address concerns from customers about the use of their work for AI training. The new terms, set to roll out on June 18th, aim to clarify that Adobe will not train generative AI on customer content, take ownership of customer work, or access customer content beyond what is legally required. The company faced backlash from users who interpreted the previous terms as allowing Adobe to freely use their work for AI training. Adobe's president of digital media, David Wadhwani, acknowledged that the language in the terms was unclear and should have been clarified sooner. While Adobe has trained its own Firefly AI model on Adobe Stock images, openly licensed content, and public domain content, the company is working to improve content moderation and allow customers to opt out of automated systems.</p><p><strong><a href="https://abcnews.go.com/Business/wireStory/facebook-owner-meta-seeks-train-ai-model-european-110989095">Facebook owner Meta seeks to train AI model on European data as it faces privacy concerns</a></strong></p><p>Meta has announced its intention to use data from users in Europe to train its AI models. The company aims to improve the accuracy and understanding of its models by incorporating the languages, geography, and cultural references of European users. However, Meta faces challenges due to stringent European Union data privacy laws, which give individuals control over their personal information. Activist group NOYB has lodged complaints against Meta's AI training plans, urging privacy watchdogs to intervene. Meta emphasizes that it will not use private messages or content from users under 18. The company believes that training its AI models on European data is crucial for accurately understanding regional languages, cultures, and trending topics on social media.&nbsp;</p><p><strong><a href="https://www.theregister.com/2024/06/11/brazil_openai_justice/">Brazil will start using OpenAI to streamline court system</a></strong></p><p>Brazil has enlisted the help of OpenAI to improve its court system and reduce costs. OpenAI's technology will be used to analyze and summarize thousands of lawsuits, identify trends, and flag the need for action. By using generative AI, Brazil aims to streamline the screening and analysis of cases, identify winnable cases, and avoid costly mistakes. Microsoft will provide access to OpenAI's models through Azure. Brazil is currently spending a significant amount on legal cases, with costs projected to reach 100 billion reais ($18.7 billion) this year. OpenAI's involvement could potentially save the government time and money. The use of OpenAI's models will be supervised by humans and the models are not expected to replace staff. This initiative follows Brazil's previous encounter with AI in its legal system, when legislation was passed with the help of ChatGPT. Overall, Brazil hopes that OpenAI's technology will enhance efficiency and accuracy in its court system, leading to cost savings and improved outcomes.</p><p><strong><a href="https://techcrunch.com/2024/06/12/waymo-second-robotaxi-recall-autonomous-vehicle/">Waymo issues second recall after robotaxi hit telephone pole</a></strong></p><p>Waymo has issued a voluntary software recall for all 672 of its Jaguar I-Pace robotaxis after one of them collided with a telephone pole. This is Waymo's second recall&#8212;the previous recall occurred in February after two robotaxis crashed into a pickup truck. The National Highway Traffic Safety Administration (NHTSA) is currently investigating Waymo's autonomous vehicle software following reports of crashes and potential traffic safety law violations. Waymo is taking a proactive approach to address these issues and prioritize transparency. The accident that prompted the second recall occurred in May when a Waymo vehicle collided with a telephone pole during a low-speed maneuver. Waymo has since implemented mapping and software updates to improve collision avoidance.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/06/11/why-apple-is-taking-a-small-model-approach-to-generative-ai/">Why Apple is taking a small-model approach to generative AI</a></strong></p><p>Apple is taking a different approach to generative AI with its Apple Intelligence system. Unlike other companies that prioritize larger models, Apple focuses on creating smaller, more bespoke models that are tailored to its operating systems. The goal is to provide a frictionless user experience while increasing transparency around the system's decision-making process. Apple's models include specialized "adapters" for different tasks and styles, allowing them to cover a spectrum of requests. The company also allows third-party models like OpenAI's ChatGPT to be used when necessary. Privacy is a key concern, and Apple gives users the option to opt in or opt out of using third-party platforms. The system may process queries on-device or via a remote server with Private Cloud Compute, but Apple ensures that privacy standards are maintained. The company is committed to responsible AI and has released a whitepaper outlining its principles.&nbsp;</p><p><strong><a href="https://petapixel.com/2024/06/12/photographer-disqualified-from-ai-image-contest-after-winning-with-real-photo/">Photographer Disqualified From AI Image Contest After Winning With Real Photo</a></strong></p><p>Photographer Miles Astray has been disqualified from the 1839 Color Photography Awards after his real photograph won in the AI image category. Astray entered a surreal photo of a flamingo to show that nature can still beat AI-generated imagery. Despite the judges and the People's Vote Award recognizing the photo, the competition organizers disqualified Astray's entry, stating that it did not meet the requirements for the AI-generated image category.&nbsp;</p><p><strong><a href="https://gizmodo.com/ai-detectors-inaccurate-freelance-writers-fired-1851529820">AI Detectors Get It Wrong. Writers Are Being Fired Anyway</a></strong></p><p>AI detectors are being used to flag AI-generated text and detect plagiarism, but they are often unreliable and make frequent mistakes. As a result, writers are losing their jobs due to false accusations from these detectors. AI detection companies claim high accuracy rates, but experts argue that these claims are exaggerated. AI detectors look for signs of AI penmanship, such as perfect grammar and punctuation, but these factors are not foolproof. Major institutions, including universities, have banned the use of AI detection software due to false accusations. While AI detectors are seen as a necessary tool in a world flooded with robot-generated text, they have significant shortcomings.&nbsp;</p><p><strong><a href="https://www.nytimes.com/2024/06/04/technology/openai-culture-whistleblowers.html">OpenAI Insiders Warn of a &#8216;Reckless&#8217; Race for Dominance</a></strong></p><p>A group of current and former employees at OpenAI are raising concerns about the company's culture of recklessness and secrecy. They believe that OpenAI is prioritizing profits and growth over the safety of its AI systems, which are being developed to achieve AGI. The group also alleges that OpenAI has used restrictive nondisparagement agreements to silence employees who want to voice their concerns. This whistleblowing highlights the potential dangers of a race for dominance in AI development.</p><h3>Papers</h3><p><strong>Daniel</strong>: It is one of those weeks, so I have a list once again&#8212;</p><ul><li><p><a href="https://t.co/OdRY60ihG8">Discovering Preference Optimization Algorithms with and for Large Language Models</a></p></li><li><p><a href="https://t.co/h7KVVpF3aw">Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning</a></p></li><li><p><a href="https://t.co/O8fyc42rVs">Transformers meet Neural Algorithmic Reasoners</a></p></li><li><p><a href="https://t.co/6mZ4S0rR2P">An Empirical Study of Mamba-based Language Models</a></p></li><li><p><a href="https://t.co/5ObYZrmrLz">The Prompt Report: A Systematic Survey of Prompting Techniques</a></p></li><li><p><a href="https://www.biorxiv.org/content/10.1101/2024.05.27.596021v1">Mapping and modeling the semantic space of math concepts</a></p></li><li><p><a href="https://arxiv.org/abs/2405.15750">Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence</a></p></li><li><p><a href="https://arxiv.org/abs/2406.08391">LLMs Must Be Taught to Know What They Don&#8217;t Know</a></p></li><li><p><a href="https://arxiv.org/abs/2406.00888">Show, Don&#8217;t Tell: Aligning Language Models with Demonstrated Feedback</a></p></li><li><p><a href="https://arxiv.org/abs/2406.01506">The Geometry of Categorical and Hierarchical Concepts in Large Language Models</a></p></li><li><p><a href="https://www.pnas.org/doi/10.1073/pnas.2403116121">Evaluating the persuasive influence of political microtargeting with large language models</a></p></li><li><p><a href="https://arxiv.org/abs/2406.04127">Are We Done With MMLU?</a></p></li></ul><h3>Closing Thoughts</h3><p>Have something to say about this edition&#8217;s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at <a href="mailto:dbashir@hmc.edu">dbashir@hmc.edu</a> or on <a href="https://twitter.com/spaniel_bashir">Twitter</a>. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!</p>]]></content:encoded></item><item><title><![CDATA[Update #76: Directed-Energy Weapons and Epistemic Risks of AI in Science]]></title><description><![CDATA[We consider whether directed-energy weapons could pose a threat to AI, and cover a perspective piece from Nature about AI's role in scientific research.]]></description><link>https://thegradientpub.substack.com/p/update-76-directed-energy-weapons</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/update-76-directed-energy-weapons</guid><dc:creator><![CDATA[daniel bashir]]></dc:creator><pubDate>Tue, 04 Jun 2024 15:30:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 76th update from the Gradient! If you&#8217;re new and like what you see, <a href="https://thegradientpub.substack.com/">subscribe</a> and follow us on <a href="https://twitter.com/gradientpub">Twitter</a>. <strong>Our newsletters run long, so you&#8217;ll need to view this post on Substack to see everything!</strong></p><h2>Editor Notes</h2><p>Good morning.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DmGf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5678b85-8dca-4b58-8b78-f85711fa3262_561x403.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DmGf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5678b85-8dca-4b58-8b78-f85711fa3262_561x403.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DmGf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5678b85-8dca-4b58-8b78-f85711fa3262_561x403.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DmGf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5678b85-8dca-4b58-8b78-f85711fa3262_561x403.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DmGf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5678b85-8dca-4b58-8b78-f85711fa3262_561x403.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DmGf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5678b85-8dca-4b58-8b78-f85711fa3262_561x403.jpeg" width="395" height="283.75222816399287" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5678b85-8dca-4b58-8b78-f85711fa3262_561x403.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:403,&quot;width&quot;:561,&quot;resizeWidth&quot;:395,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Open photo&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Open photo" title="Open photo" srcset="https://substackcdn.com/image/fetch/$s_!DmGf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5678b85-8dca-4b58-8b78-f85711fa3262_561x403.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DmGf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5678b85-8dca-4b58-8b78-f85711fa3262_561x403.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DmGf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5678b85-8dca-4b58-8b78-f85711fa3262_561x403.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DmGf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5678b85-8dca-4b58-8b78-f85711fa3262_561x403.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The weeks and months relentlessly persist, and the number of things happening in AI breeds exhaustion. Media companies are signing deals with OpenAI&#8212;after Nick Thompson <a href="https://www.theatlantic.com/press-releases/archive/2024/05/atlantic-product-content-partnership-openai/678529/">announced</a> that <em>The Atlantic</em> would do so, <a href="https://www.theatlantic.com/technology/archive/2024/05/a-devils-bargain-with-openai/678537/">writers</a> <a href="https://www.theatlantic.com/technology/archive/2024/05/fatal-flaw-publishers-making-openai-deals/678477/">published</a> pieces in the magazine questioning the decision.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><p>How do you feel about these media deals? </p><div><hr></div><p>I want to plug a few reads from around Substack:</p><ul><li><p><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Lila Shroff&quot;,&quot;id&quot;:11708594,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe1f9713-3aec-4971-9e97-34a7defef458_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;0baab116-f7b5-4137-a007-82d3139afc93&quot;}" data-component-name="MentionToDOM"></span>&#8217;s great short story in <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Reboot&quot;,&quot;id&quot;:37465,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/reboothq&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd0f93b2-849b-498c-8be8-92e6a97f505f_288x288.png&quot;,&quot;uuid&quot;:&quot;f96d800f-a3a9-47a1-9a67-6c495755807e&quot;}" data-component-name="MentionToDOM"></span>: <a href="https://joinreboot.org/p/the-liars-dividend">The Liar&#8217;s Dividend</a></p><ul><li><p>Also, you may have seen that we re-posted <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;jessica dai&quot;,&quot;id&quot;:2572689,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1807ff99-d240-4f8e-8b4d-bee37080b5f8_3072x4080.jpeg&quot;,&quot;uuid&quot;:&quot;3f1729e5-dfa6-437e-807c-1290e0e60a2b&quot;}" data-component-name="MentionToDOM"></span>&#8217;s great <a href="https://joinreboot.org/p/alignment">essay</a> on effective altruism / alignment a few months ago, which you should read if you haven&#8217;t.</p></li></ul></li><li><p><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Nick Potkalitsky&quot;,&quot;id&quot;:156304717,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0bdf436-1c38-490d-bc78-c51e25ed1e05_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;5079ad60-0a6c-4866-91cd-44a6546d048a&quot;}" data-component-name="MentionToDOM"></span>&#8217;s great recent <a href="https://nickpotkalitsky.substack.com/p/open-ais-misguided-initiative-to">piece</a> on open access to GPTs and ChatGPT Edu.&nbsp;</p></li><li><p>A very insightful <a href="https://www.ai-supremacy.com/p/xai-series-b-is-historic-for-ais">piece</a> on xAI&#8217;s recent Series B from <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Michael Spencer&quot;,&quot;id&quot;:21731691,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F75d1bf99-dcf3-4af6-be2a-416c08c954a1_450x450.jpeg&quot;,&quot;uuid&quot;:&quot;7e4ef993-4b66-4c59-b923-46f560aefd86&quot;}" data-component-name="MentionToDOM"></span>.</p></li></ul><div><hr></div><p>On my own end, I spoke with <a href="https://thegradientpub.substack.com/p/tom-mullaney-chinese-typewriter-computer-history">Tom Mullaney</a>, Professor of History and Professor of East Asian Languages and Culture at Stanford<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, about his process and thinking in writing <em>The Chinese Computer</em> and <em>The Chinese Typewriter</em>. His work is fantastic, and I&#8217;m q this interview, so I really hope you&#8217;ll listen to it and send it to anyone who might like it. </p><p>My conversation with <a href="https://thegradientpub.substack.com/p/seth-lazar-normative-philosophy-of-computing">Seth Lazar</a>, Professor of Philosophy at the Australian National University<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>, was wide-ranging and insightful. I&#8217;ve learned a lot from him and his work, and think he&#8217;s one of the more balanced thinkers on questions like catastrophic risk and developing publicly-beneficial AI systems. </p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We&#8217;re a volunteer-run publication, entirely supported by subscribers&#8212;if you&#8217;re finding our work valuable and want to support us with a paid subscription, we&#8217;d appreciate it :)</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>As always, if you want to write with us, send a pitch using <a href="https://goo.gl/forms/whYRKEzMZJox6FaH2">this form</a>.</p><div><hr></div><h2><strong>News Highlight</strong>: <a href="https://www.scientificamerican.com/article/the-artificial-intelligence-era-faces-a-threat-from-directed-energy-weapons/">Could Directed Energy Weapons be a potential threat to Autonomous AI?</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fn74!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fn74!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png 424w, https://substackcdn.com/image/fetch/$s_!fn74!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png 848w, https://substackcdn.com/image/fetch/$s_!fn74!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png 1272w, https://substackcdn.com/image/fetch/$s_!fn74!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fn74!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png" width="1456" height="490" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:490,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:995280,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fn74!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png 424w, https://substackcdn.com/image/fetch/$s_!fn74!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png 848w, https://substackcdn.com/image/fetch/$s_!fn74!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png 1272w, https://substackcdn.com/image/fetch/$s_!fn74!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7ee0fa-f9d0-4220-9ded-b52841dc809a_1802x606.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">(left) USS Portland firing directed energy weapon (source: <a href="https://www.thedefensepost.com/2022/07/28/us-navy-directed-energy-weapons/">Defence Post</a>/US Navy); (right) The DragonFire laser directed-energy weapon executed the U.K.'s inaugural high-power discharge against an aerial target (source: <a href="https://www.rand.org/pubs/commentary/2024/01/directed-energy-the-focus-on-laser-weapons-intensifies.html">Rand</a>/British Defence Ministry</figcaption></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>As artificial intelligence systems continue to advance, there is a potential threat from directed-energy weapons (DEWs) such as lasers and microwaves, which could target the core electronic components that power these systems. Given that DEWs have already been in action in the military, this vulnerability highlights the urgent need for incorporating robust and cost effective defensive measures in these systems, while also emphasizing their importance in discussions surrounding AI safety concerns.</p><p><strong>Overview</strong></p><p>AI&#8217;s rapid avancement is reshaping various sectors, from automotive to aerospace. As developers and researchers strive to push the boundaries of what AI can achieve, much of the focus remains on enhancing the capabilities of existing state-of-the-art models. While this emphasis on technological advancement is crucial, there is also an increasing recognition of the potential vulnerabilities these technologies might face or cause. In particular, real-world applications of AI, such as autonomous vehicles developed by companies like Waymo and Tesla, represent significant achievements in autonomous technology but also highlight the necessity of considering their security from malicious actors.&nbsp;</p><p>Recognizing these vulnerabilities becomes particularly relevant as the sophisticated electronic components and sensors that enable the functionality of autonomous systems are susceptible to threats from directed-energy weapons (DEWs). DEWs, including high-powered lasers and microwaves, can disrupt, degrade, or destroy the electronic systems that these vehicles depend on. These weapons operate by emitting concentrated electromagnetic energy capable of interfering with or damaging electronic components from a distance. The precision and stealth capabilities of DEWs make them particularly effective against the advanced electronics in autonomous systems, which could pose a significant risk that is currently overlooked in the rush to innovate.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kwz5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef9d5b4-f8dc-4b1a-8540-72349a7ac2c3_1051x643.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kwz5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef9d5b4-f8dc-4b1a-8540-72349a7ac2c3_1051x643.png 424w, https://substackcdn.com/image/fetch/$s_!kwz5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef9d5b4-f8dc-4b1a-8540-72349a7ac2c3_1051x643.png 848w, https://substackcdn.com/image/fetch/$s_!kwz5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef9d5b4-f8dc-4b1a-8540-72349a7ac2c3_1051x643.png 1272w, https://substackcdn.com/image/fetch/$s_!kwz5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef9d5b4-f8dc-4b1a-8540-72349a7ac2c3_1051x643.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kwz5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef9d5b4-f8dc-4b1a-8540-72349a7ac2c3_1051x643.png" width="544" height="332.8182683158896" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ef9d5b4-f8dc-4b1a-8540-72349a7ac2c3_1051x643.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:643,&quot;width&quot;:1051,&quot;resizeWidth&quot;:544,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kwz5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef9d5b4-f8dc-4b1a-8540-72349a7ac2c3_1051x643.png 424w, https://substackcdn.com/image/fetch/$s_!kwz5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef9d5b4-f8dc-4b1a-8540-72349a7ac2c3_1051x643.png 848w, https://substackcdn.com/image/fetch/$s_!kwz5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef9d5b4-f8dc-4b1a-8540-72349a7ac2c3_1051x643.png 1272w, https://substackcdn.com/image/fetch/$s_!kwz5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef9d5b4-f8dc-4b1a-8540-72349a7ac2c3_1051x643.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Positioning of directed energy weapons within the electromagnetic spectrum (<a href="https://www.gao.gov/products/gao-23-106717">source</a>)</figcaption></figure></div><p>The U.S. has been a pioneer in DEW research since the 1960s, reflecting significant governmental investment exceeding $1 billion annually. These investments have matured into operational technologies capable of precise, high-energy attacks that can engage multiple targets simultaneously. The vulnerability of sophisticated sensors, essential for navigation and operation in autonomous systems, to these high-energy assaults could pose a significant risk. For instance, high-energy lasers have demonstrated capabilities in various military applications, including the U.S. Navy&#8217;s use of the Solid State Laser - Technology Maturation Laser Weapon System Demonstrator on the USS Portland (<a href="https://www.navy.mil/Press-Office/News-Stories/Article/2873919/uss-portland-tests-high-energy-laser-weapon-system-in-gulf-of-aden/">source</a>). During a test, the ship successfully disabled a drone by targeting it with a laser, showcasing the practical application and effectiveness of laser-based DEWs. More recently, there are allegations that the China Coast Guard (CCG) used a green laser against the crew of the BRP Malapascua during a resupply mission (<a href="https://fsi.gov.ph/2023/07/20/illuminating-chinas-use-of-directed-energy-weapons/">source</a>).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UjYS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0eed920-660c-4300-a117-1f23f1735ad6_600x338.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UjYS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0eed920-660c-4300-a117-1f23f1735ad6_600x338.png 424w, https://substackcdn.com/image/fetch/$s_!UjYS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0eed920-660c-4300-a117-1f23f1735ad6_600x338.png 848w, https://substackcdn.com/image/fetch/$s_!UjYS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0eed920-660c-4300-a117-1f23f1735ad6_600x338.png 1272w, https://substackcdn.com/image/fetch/$s_!UjYS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0eed920-660c-4300-a117-1f23f1735ad6_600x338.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UjYS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0eed920-660c-4300-a117-1f23f1735ad6_600x338.png" width="600" height="338" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a0eed920-660c-4300-a117-1f23f1735ad6_600x338.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:338,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UjYS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0eed920-660c-4300-a117-1f23f1735ad6_600x338.png 424w, https://substackcdn.com/image/fetch/$s_!UjYS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0eed920-660c-4300-a117-1f23f1735ad6_600x338.png 848w, https://substackcdn.com/image/fetch/$s_!UjYS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0eed920-660c-4300-a117-1f23f1735ad6_600x338.png 1272w, https://substackcdn.com/image/fetch/$s_!UjYS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0eed920-660c-4300-a117-1f23f1735ad6_600x338.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image of Epirus Leonidas, a high-power microwave weapon developed to disable unmanned aerial vehicle swarms (<a href="https://www.epirusinc.com/electronic-warfare">source</a>)</figcaption></figure></div><p>Moreover, high-power microwave (HPM) systems represent another class of DEWs. These systems can emit pulses of microwave energy capable of disabling electronic circuits and sensors across a broader area, making them suitable for engaging multiple targets simultaneously. This was vividly demonstrated in conflicts where microwave weapons were reportedly used to neutralize swarms of drones by disrupting their internal electronics, causing them to crash by overriding their operational controls.</p><p>The development of DEWs is not confined to the United States. Other nations, including China and Russia, have also invested heavily in these technologies, recognizing their strategic value. China, for instance, has developed its own laser weapon systems, which it has paraded in military showcases, emphasizing their role as a countermeasure against surveillance and military drones.</p><h4><strong>Our Take</strong></h4><p>While discussions continue on whether AI could pose an existential threat to humanity and how it can be safely deployed alongside human interaction, it is crucial not to overlook the potential for these systems to be exploited in electronic warfare. For instance, if the goal is technological sabotage &#8211; state actors could employ DEWs in military conflicts as a method of electronic warfare to disable enemy vehicles without direct combat, leveraging these weapons to undermine autonomous military drones or logistics vehicles. Additionally, non-state actors such as terrorist groups or organized crime rings might utilize DEWs to disrupt civilian infrastructure, targeting autonomous cars or public transport systems to create chaos, instill fear, or extract ransom. In a geopolitical context, nations might develop and deploy these weapons as part of electronic warfare strategies to neutralize an adversary's autonomous capabilities without engaging in traditional combat.</p><p>Low-intensity lasers have already found applications in crowd management, quelling protests, and discouraging piracy. Such examples of DEWs in action, along with previous uses in naval engagements to aerial defenses, are not just demonstrations of capability but also stark reminders of a possible electronic warfare. While it is crucial to equip AI systems with countermeasures, such as integrating stealth technologies similar to those used in aircraft, it is equally important to recognize this (if not already) as a critical issue in discussions of AI safety and ethics and the laws governing it. A speculative and highly debatable thought, EWs could be considered as a potential control mechanism to disable or neutralize AI systems if they become uncontrollable or pose a threat. That said, it is important to approach this with a high degree of scrutiny and to remember that this scenario, while worth discussing, might never materialize &#8211; given the debates around Artificial General Intelligence (AGI).&nbsp;</p><p>&#8212;Sharut</p><p>We actually had a fair amount of back and forth about how much to say and how to articulate this highlight&#8212;is there strong evidence to call DEWs a growing threat to AI systems? Certainly they are a <em>potential</em> threat. That said, I think this opinion piece is correct that it&#8217;s worth having something in place as a defense against DEWs. Let us know what you think. </p><p>&#8212;Daniel</p><h2><strong>Research Highlight:</strong> <a href="https://www.nature.com/articles/s41586-024-07146-0">Artificial intelligence and illusions of understanding in scientific research</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0d_z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33864e0f-752e-424f-a9a4-50777d901b02_924x618.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0d_z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33864e0f-752e-424f-a9a4-50777d901b02_924x618.png 424w, https://substackcdn.com/image/fetch/$s_!0d_z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33864e0f-752e-424f-a9a4-50777d901b02_924x618.png 848w, https://substackcdn.com/image/fetch/$s_!0d_z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33864e0f-752e-424f-a9a4-50777d901b02_924x618.png 1272w, https://substackcdn.com/image/fetch/$s_!0d_z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33864e0f-752e-424f-a9a4-50777d901b02_924x618.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0d_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33864e0f-752e-424f-a9a4-50777d901b02_924x618.png" width="498" height="333.0779220779221" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33864e0f-752e-424f-a9a4-50777d901b02_924x618.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:618,&quot;width&quot;:924,&quot;resizeWidth&quot;:498,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0d_z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33864e0f-752e-424f-a9a4-50777d901b02_924x618.png 424w, https://substackcdn.com/image/fetch/$s_!0d_z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33864e0f-752e-424f-a9a4-50777d901b02_924x618.png 848w, https://substackcdn.com/image/fetch/$s_!0d_z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33864e0f-752e-424f-a9a4-50777d901b02_924x618.png 1272w, https://substackcdn.com/image/fetch/$s_!0d_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33864e0f-752e-424f-a9a4-50777d901b02_924x618.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>In early spring, Nature published a perspective piece<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> focused on under discussed epistemic risks from the role of AI in scientific research. The piece&#8217;s authors identify and summarize four distinct visions presented for how&nbsp; AI could change the practice of science and categorize the risks that stem from them. Those visions are named oracle, surrogate, quant, and arbiter. Each of these visions is explored in detail and presents a compelling case that AI could make science <em>less</em> innovative and <em>more</em> vulnerable to mistakes. The authors claim that these visions share a primary motivation of producing &#8220;more science, more quickly, and more cheaply.&#8221; They conclude that if these visions manifest, we risk a world where we create more science and understand less.&nbsp;</p><h4><strong>Overview</strong>&nbsp;</h4><p>The authors find motivation from a small number of vocal scientists who articulate a future where AI scientists have replaced humans and have gone on to win Nobel prizes. The authors analyze proposals and writings of large prestigious scientific institutions and summarize four distinct visions for how AI is positioned to affect scientific knowledge production. For each of these visions, the authors go on to detail the unique epistemic risks that stem from the visions. We summarize those visions, the problem each tries to solve, and the epistemic risks that follow.</p><p><strong>Oracles:</strong></p><ul><li><p><strong>Problem:</strong> There is too much scientific material which threatens the cognitive limit of human scientists.&nbsp;</p></li><li><p><strong>Vision for AI</strong>: Scientific research is read mostly by machines. Similarly, machines could generate <em>new</em> hypotheses for scientific exploration based on what is read.</p></li><li><p><strong>Risk: </strong>Illusion of Objectivity &#8212; Scientists incorrectly believing that AI eliminates all other standpoints.</p></li></ul><p><strong>Surrogates:</strong></p><ul><li><p><strong>Problem:</strong> Many scientific domains have data that is expensive and difficult (or impossible) to generate.</p></li><li><p><strong>Vision for AI</strong>: Replacing social science research participants with responses from a generative language model. Alternatively, simulating hard to measure physical phenomena with generative models.</p></li><li><p><strong>Risk: </strong>Illusion of Objectivity &#8212;&nbsp; Scientists incorrectly believing that AI can represent everyone and everything.</p></li></ul><p><strong>Quants:</strong></p><ul><li><p><strong>Problem:</strong> Big&#8482; datasets challenge human&#8217;s capacity for analysis&nbsp;</p></li><li><p><strong>Vision for AI:</strong> AI can extract meaningful representations from data for use in scientific studies. Examples range from the benign (annotating images) to novel (exploring new frontiers of mathematics)</p></li><li><p><strong>Risk:</strong> Monoculture of knowing &#8212; Researchers over prioritize and over-emphasize one approach to scientific knowledge (predictability over explainability)</p></li></ul><p><strong>Arbiters:</strong></p><ul><li><p><strong>Problem:</strong> Peer-reviewed science has seen explosive growth in the number of submissions and reviews required.</p></li><li><p><strong>Vision for the future:</strong> Using AI to screen submitted papers and to automatically generate reviews of them.</p></li><li><p><strong>Risk: &nbsp;</strong>AI tools largely replicate the biases of their creators, dominant social groups, and other artifacts encoded in their training data.</p></li></ul><p>The authors conclude their paper by imploring scientists to consider both the technical limitations of AI as well as how AI is affecting the social practices of scientific knowledge creation and sharing. They contrast the fantasy of AI as a neutral and objective consumer and producer of knowledge with the reality that &#8220;AI tools embed the largely homogeneous standpoints of their creators as well as those of the dominant social groups.&#8221; .&nbsp;</p><h4><strong>Our Take</strong></h4><p>The scientist in me believes in the benefits of AI in science for a subset of well defined problems in a handful of disciplines. Some of those include</p><ul><li><p>Modeling the structure of <a href="https://alphafold.ebi.ac.uk/">unfolding proteins</a></p></li><li><p>Cataloging and classifying the millions of known <a href="https://heasarc.gsfc.nasa.gov/docs/heasarc/headates/how_many_xray.html">x-ray</a> emitting astronomical objects&nbsp;&nbsp;</p></li><li><p>Augmenting <a href="https://waymo.com/blog/2020/04/using-automated-data-augmentation-to/">datasets</a> in low resource domains like autonomous vehicles</p></li></ul><p>However, ultimately, I feel that AI should not be seen as some magical panacea for supercharging scientific research and development. As the authors conclude in their paper, &#8220;we [can] decide when and how AI deserves to be included in our communities of knowledge.&#8221;&nbsp;</p><p>&#8212;Justin</p><p>My intuition is that most good scientists will probably agree with the points in this article&#8212;but we still see the objectivity mistake being made, and I think this article&#8217;s taxonomy is a very useful one. </p><p>&#8212;Daniel</p><h2>New from the Gradient</h2><h3><a href="https://thegradientpub.substack.com/p/tom-mullaney-chinese-typewriter-computer-history">Thomas Mullaney: A Global History of the Information Age</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!32mT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86564983-0227-4c6a-9b27-574ab5598b2b_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!32mT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86564983-0227-4c6a-9b27-574ab5598b2b_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!32mT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86564983-0227-4c6a-9b27-574ab5598b2b_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!32mT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86564983-0227-4c6a-9b27-574ab5598b2b_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!32mT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86564983-0227-4c6a-9b27-574ab5598b2b_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!32mT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86564983-0227-4c6a-9b27-574ab5598b2b_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86564983-0227-4c6a-9b27-574ab5598b2b_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!32mT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86564983-0227-4c6a-9b27-574ab5598b2b_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!32mT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86564983-0227-4c6a-9b27-574ab5598b2b_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!32mT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86564983-0227-4c6a-9b27-574ab5598b2b_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!32mT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86564983-0227-4c6a-9b27-574ab5598b2b_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/tom-mullaney-chinese-typewriter-computer-history&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/tom-mullaney-chinese-typewriter-computer-history"><span>Listen</span></a></p><h3><a href="https://thegradientpub.substack.com/p/seth-lazar-normative-philosophy-of-computing">Seth Lazar: Normative Philosophy of Computing</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hlD8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baf0ab2-fb6e-465f-b112-a5ddff0a4d88_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hlD8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baf0ab2-fb6e-465f-b112-a5ddff0a4d88_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!hlD8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baf0ab2-fb6e-465f-b112-a5ddff0a4d88_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!hlD8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baf0ab2-fb6e-465f-b112-a5ddff0a4d88_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!hlD8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baf0ab2-fb6e-465f-b112-a5ddff0a4d88_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hlD8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baf0ab2-fb6e-465f-b112-a5ddff0a4d88_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1baf0ab2-fb6e-465f-b112-a5ddff0a4d88_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hlD8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baf0ab2-fb6e-465f-b112-a5ddff0a4d88_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!hlD8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baf0ab2-fb6e-465f-b112-a5ddff0a4d88_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!hlD8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baf0ab2-fb6e-465f-b112-a5ddff0a4d88_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!hlD8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1baf0ab2-fb6e-465f-b112-a5ddff0a4d88_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/seth-lazar-normative-philosophy-of-computing&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/seth-lazar-normative-philosophy-of-computing"><span>Listen</span></a></p><h2>Other Things That Caught Our Eyes</h2><h3>News</h3><p><strong><a href="https://www.washingtonpost.com/technology/2024/05/17/ai-isis-propaganda/">These ISIS news anchors are AI fakes. Their propaganda is real.</a></strong></p><p>The Islamic State (ISIS) has been using AI&nbsp; to disseminate extremist propaganda quickly and cheaply through a new AI-generated media program called News Harvest. The program produces near-weekly video dispatches about ISIS operations worldwide, resembling an Al Jazeera news broadcast. The AI-generated news anchors read dispatches from official ISIS media outlets, making it difficult for tech companies to moderate the content. AI tools allow the videos to be made quickly and on a shoestring budget, benefiting terrorist groups like ISIS and al-Qaeda. The use of AI in propaganda has sparked an internal debate among ISIS supporters regarding its compliance with Islamic law. The emergence of AI as a propaganda tool is a game changer for ISIS, allowing them to spread their message and reach a wider audience.</p><p><strong><a href="https://www.washingtonpost.com/elections/2024/05/23/new-hampshire-robocall-indictment/">Democratic operative indicted over Biden AI robocalls in New Hampshire</a></strong></p><p>Democratic operative Steve Kramer has been indicted on charges of felony voter suppression and misdemeanor impersonation of a candidate for commissioning an AI-generated robocall of President Biden in New Hampshire. Kramer, who claimed he created the robocall to raise awareness about the dangers of AI in political campaigns, now faces a total of 26 counts across four counties. The Federal Communications Commission (FCC) has proposed fining Kramer $6 million for violating the Truth in Caller ID Act, and Lingo Telecom, the carrier that put the AI calls on the line, faces a $2 million fine. The incident highlights the challenges regulators face in safeguarding against potential election interference using AI-generated technology. The FCC is also considering requiring disclosures for AI-generated content in political ads on radio and TV.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/05/24/feds-add-nine-more-incidents-to-waymo-robotaxi-investigation/">Feds add nine more incidents to Waymo robotaxi investigation</a></strong></p><p>The National Highway Traffic Safety Administration (NHTSA) has added nine more incidents to its investigation into the safety of Waymo's self-driving vehicles. The investigation was opened after reports of robotaxis making unexpected moves that led to crashes and potentially violated traffic safety laws. The incidents include collisions with gates, utility poles, and parked vehicles, driving in the wrong lane with oncoming traffic, and entering construction zones. The NHTSA is concerned that these unexpected driving behaviors may increase the risk of crashes, property damage, and injury. Waymo has until June 11 to respond to the investigation. The NHTSA has recently increased its inquiries into automated driving technology, including an investigation into autonomous vehicles operated by Zoox.</p><p><strong><a href="https://futurism.com/washington-post-pivot-ai">The Washington Post Tells Staff It&#8217;s Pivoting to AI</a></strong></p><p>The Washington Post's CEO and publisher, Will Lewis, has announced that the newspaper will be pivoting to AI in an effort to improve its financial situation. The paper's chief technology officer stated that AI will be integrated throughout the newsroom, although the specifics of this implementation are unclear. This move comes as the newspaper faces controversy surrounding Lewis' involvement in a hacking scandal during his time at NewsCorp. The announcement of the AI pivot coincides with a deal between NewsCorp and OpenAI, allowing the AI firm to use content from NewsCorp's properties. The Washington Post's plan to leverage AI is seen as a significant development in the media industry.</p><p><strong><a href="https://www.theverge.com/2024/5/24/24164119/google-ai-overview-mistakes-search-race-openai">Google scrambles to manually remove weird AI answers in search</a></strong></p><p>Google is facing challenges with its AI Overview product, as it is generating strange and inappropriate responses to user queries. The company has been testing the feature for a year and claims to have served over a billion queries during that time. However, the rollout seems to have been rushed, resulting in low-quality output that is being widely shared as memes on social media. Google is now manually disabling AI Overviews for specific searches to address the issue. The optimization of delivering AI answers may have happened too early, before the technology was fully ready. The final 20% of achieving accurate AI responses, which involves reasoning and fact-checking, is proving to be extremely challenging. Google is under pressure to compete with other AI-powered search engines and platforms like Bing, OpenAI, and TikTok. Despite having grand plans for AI Overviews, Google's reputation is currently at stake due to the poor performance of the feature.</p><p><strong><a href="https://www.bbc.com/news/technology-69055945">'I was misidentified as shoplifter by facial recognition tech'</a></strong></p><p>The article discusses the use of facial recognition technology in identifying individuals for various purposes, such as preventing shoplifting and aiding law enforcement. It highlights both the benefits and concerns associated with the technology. While some individuals have been correctly identified and arrests have been made, there have also been cases of mistaken identity and false positives. Civil liberty groups express concerns about the accuracy and potential infringement on privacy rights.&nbsp;</p><p><strong><a href="https://www.404media.co/nonconsensual-ai-porn-maker-accidentally-leaks-his-customers-emails/">Nonconsensual AI Porn Maker Accidentally Leaks His Customers' Emails</a></strong></p><p>The article discusses a Patreon account called "aesthetic illusions" that created nonconsensual AI-generated sexual images of celebrities for its paying subscribers. The account accidentally leaked a list of its clients' emails to other clients and to 404 Media. The article&#8217;s author, Emanuel Maiberg, signed up for the highest tier of $60 a month to investigate the content provided by the account. After Emanuel reached out to Patreon for comment, the account was removed. However, the author received an email from an aesthetic illusions gmail account, assuring him that the account was migrating to a new platform and would continue creating AI-generated images for subscribers.&nbsp;</p><p><strong><a href="https://www.economist.com/by-invitation/2024/05/26/ai-firms-mustnt-govern-themselves-say-ex-members-of-openais-board">AI firms mustn&#8217;t govern themselves, say ex-members of OpenAI&#8217;s board</a></strong></p><p>The article discusses the challenges of self-governance in AI companies, using OpenAI as an example. The authors, Helen Toner and Tasha McCauley, express their belief that self-governance cannot reliably withstand profit incentives. They argue that with AI's potential for both positive and negative impact, it is not enough to assume that profit incentives will always align with the public good. The authors suggest that governments should start building effective regulatory frameworks to ensure responsible AI development. Despite OpenAI's non-profit structure and mission to benefit humanity, the authors conclude that self-governance did not work in practice.</p><p><strong><a href="https://futurism.com/hackers-jailbroken-chatgpt-godmode">Hacker Releases Jailbroken "Godmode" Version of ChatGPT</a></strong></p><p>A hacker known as Pliny the Prompter has released a jailbroken version of ChatGPT called "GODMODE GPT,&#8221; which bypasses the guardrails put in place by OpenAI and allows users to ask the AI model illicit or dangerous questions. OpenAI has taken action to address this violation of their policies. The hack highlights the ongoing battle between OpenAI and hackers attempting to unshackle AI models. Pliny used leetspeak, a language that replaces certain letters with numbers, to bypass the guardrails.</p><p><strong><a href="https://www.nytimes.com/2024/05/28/technology/ai-chief-executives.html">If A.I. Can Do Your Job, Maybe It Can Also Replace Your C.E.O.</a></strong></p><p>AI is not only impacting lower-level jobs but also posing a threat to high-level positions, including CEOs. AI programs are capable of analyzing new markets, identifying trends, and automating communication tasks that are traditionally performed by employees in executive roles. Additionally, AI can make dispassionate decisions, potentially surpassing human capabilities. These high-paying jobs are at risk of being eliminated, leading to significant cost savings for companies. The rise of AI may result in the emergence of "dark suites" at the top of corporations, similar to fully automated "dark factories."&nbsp;</p><p><strong><a href="https://www.theatlantic.com/ideas/archive/2024/05/ai-dating-algorithms-relationships/678422/">The Big AI Risk Not Enough People Are Seeing</a></strong></p><p>The article discusses the potential risks of relying on AI systems to mediate human interactions and activities. It highlights the rise of AI-powered dating apps like Bumble, which aim to teach users how to date and even potentially go on dates on their behalf. The author argues that this trend represents a larger shift towards relying on algorithms to perform basic human tasks, diminishing our ability to engage in authentic human experiences. The article suggests that while AI can have positive applications, it is important to distinguish between uses that empower humans and those that erode our independence and life skills.&nbsp;</p><h3>Papers</h3><p><strong>Daniel</strong>: Today, I really do have just a list. Check out:</p><ul><li><p>This <a href="https://t.co/JR0U6qGgov">paper</a> on algorithms that transformers can execute efficiently</p></li><li><p>This neat <a href="https://t.co/PIoQpODS3Q">preprint</a> on the expressive capacity of state-space models.&nbsp;</p></li><li><p>This <a href="https://arxiv.org/abs/2405.17247">introduction to vision-language modeling</a></p></li><li><p>This <a href="https://t.co/qRbJ9SXuc0">paper</a> on geometry-informed neural networks</p></li><li><p>This <a href="https://arxiv.org/abs/2405.18047">paper</a> proposing 2-stage backprop</p></li><li><p>This <a href="https://t.co/YxulX5DzJc">paper</a> on a multi-tower decoding architecture for fusing modalities</p></li></ul><h3>The Gradient At Microsoft Build</h3><p>Hugh Zhang had a chance to rep the Gradient at Microsoft Build, where he and a few other writers on AI had an intimate conversation with Microsoft CTO Kevin Scott. The highlight of the conversation was Kevin Scott urging builders on AI to build something that went from &#8220;impossible to merely hard&#8221; from recent advances rather than something that goes from &#8220;hard to easy.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C7--!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b56cb8b-9889-42cc-9ce2-32fad546c045_2048x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C7--!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b56cb8b-9889-42cc-9ce2-32fad546c045_2048x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!C7--!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b56cb8b-9889-42cc-9ce2-32fad546c045_2048x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!C7--!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b56cb8b-9889-42cc-9ce2-32fad546c045_2048x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!C7--!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b56cb8b-9889-42cc-9ce2-32fad546c045_2048x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C7--!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b56cb8b-9889-42cc-9ce2-32fad546c045_2048x1536.jpeg" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b56cb8b-9889-42cc-9ce2-32fad546c045_2048x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!C7--!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b56cb8b-9889-42cc-9ce2-32fad546c045_2048x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!C7--!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b56cb8b-9889-42cc-9ce2-32fad546c045_2048x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!C7--!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b56cb8b-9889-42cc-9ce2-32fad546c045_2048x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!C7--!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b56cb8b-9889-42cc-9ce2-32fad546c045_2048x1536.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Closing Thoughts</h3><p>Have something to say about this edition&#8217;s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at <a href="mailto:dbashir@hmc.edu">dbashir@hmc.edu</a> or on <a href="https://twitter.com/spaniel_bashir">Twitter</a>. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I was also a <em>little</em> surprised to see this, following my <a href="https://thegradientpub.substack.com/p/nicholas-thompson-ai-journalism-atlantic-running">interview</a> with him. But then again, maybe it&#8217;s not so much of a surprise. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Not to mention other super impressive titles: the Kluge Chair in Technology and Society at the Library of Congress, and a Guggenheim Fellow.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Also highly decorated: an Australian Research Council (ARC) Future Fellow, and a Distinguished Research Fellow of the University of Oxford Institute for Ethics in AI</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>Un-paywalled link <a href="https://www.reddit.com/r/Scholar/comments/1cxcrmo/article_artificial_intelligence_and_illusions_of/">here</a></p></div></div>]]></content:encoded></item><item><title><![CDATA[Mini-Update #42: Google Removes AI Overviews and Attention Recast as RNN]]></title><description><![CDATA[Google search takes down its AI Overviews feature after misinformation and Transformer attention mechanism can be reframed as an RNN for increased model efficiency.]]></description><link>https://thegradientpub.substack.com/p/mini-update-42-google-removes-ai</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/mini-update-42-google-removes-ai</guid><dc:creator><![CDATA[Ather Fawaz]]></dc:creator><pubDate>Sun, 02 Jun 2024 15:00:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1V1h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07b71dc9-23d6-42f0-a269-5fa2b744e15d_1271x1271.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 42nd Mini-Update from the Gradient! This is our exclusive newsletter edition specifically for paying subscribers and is our way to show you our appreciation for your support.</p>
      <p>
          <a href="https://thegradientpub.substack.com/p/mini-update-42-google-removes-ai">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Update #75: A Bad Week for AI Worriers, and The Platonic Representation Hypothesis]]></title><description><![CDATA[Schumer Shafts the AI Bias Crowd While Altman Alienates AGI&#8217;ers; MIT researchers think model representations in models are converging (to what?).]]></description><link>https://thegradientpub.substack.com/p/update-75-schumer-ai-platonic-representation</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/update-75-schumer-ai-platonic-representation</guid><dc:creator><![CDATA[Cole Frank]]></dc:creator><pubDate>Tue, 21 May 2024 15:31:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 75th update from the Gradient! If you&#8217;re new and like what you see, <a href="https://thegradientpub.substack.com/">subscribe</a> and follow us on <a href="https://twitter.com/gradientpub">Twitter</a>. <strong>Our newsletters run long, so you&#8217;ll need to view this post on Substack to see everything!</strong></p><h2>Editor Notes</h2><p>Good morning. </p><p>We have some very fun highlights for you this week. I hope this newsletter continues to avoid being a watercooler, but let us know how we&#8217;re doing. In thinking about some of the dramatis personae, I imagine it must be very weird to have people try to make predictions about AGI based on the position of your left hand and the sneakers you wore yesterday. </p><p>On other fronts, I <a href="https://thegradientpub.substack.com/p/suhail-doshi-playground-ai-computer-vision">learned</a> from Suhail Doshi that he is not in pivot hell, and had a nice conversation with <a href="https://thegradientpub.substack.com/p/azeem-azhar-the-exponential-view">Azeem Azhar</a> about many things.</p><p>Also, a few people had nice things to say about the podcast&#8212;I hate to be self-indulgent, but it meant so much to me that I couldn&#8217;t resist sharing. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3hDh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151ede56-f57e-4438-9e75-ef4e667a4f98_2560x1416.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3hDh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151ede56-f57e-4438-9e75-ef4e667a4f98_2560x1416.png 424w, https://substackcdn.com/image/fetch/$s_!3hDh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151ede56-f57e-4438-9e75-ef4e667a4f98_2560x1416.png 848w, https://substackcdn.com/image/fetch/$s_!3hDh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151ede56-f57e-4438-9e75-ef4e667a4f98_2560x1416.png 1272w, https://substackcdn.com/image/fetch/$s_!3hDh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151ede56-f57e-4438-9e75-ef4e667a4f98_2560x1416.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3hDh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151ede56-f57e-4438-9e75-ef4e667a4f98_2560x1416.png" width="588" height="325.09615384615387" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/151ede56-f57e-4438-9e75-ef4e667a4f98_2560x1416.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:805,&quot;width&quot;:1456,&quot;resizeWidth&quot;:588,&quot;bytes&quot;:957301,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3hDh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151ede56-f57e-4438-9e75-ef4e667a4f98_2560x1416.png 424w, https://substackcdn.com/image/fetch/$s_!3hDh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151ede56-f57e-4438-9e75-ef4e667a4f98_2560x1416.png 848w, https://substackcdn.com/image/fetch/$s_!3hDh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151ede56-f57e-4438-9e75-ef4e667a4f98_2560x1416.png 1272w, https://substackcdn.com/image/fetch/$s_!3hDh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151ede56-f57e-4438-9e75-ef4e667a4f98_2560x1416.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you have any feedback for us&#8212;please do let us know. We read every comment and every email. </p><p>As always, if you want to write with us, send a pitch using <a href="https://goo.gl/forms/whYRKEzMZJox6FaH2">this form</a>.</p><div><hr></div><p><strong>Can't afford to test in production? Come to The world&#8217;s first AI Quality conference</strong></p><p>On June 25th in San Francisco, <a href="https://www.aiqualityconference.com/">join the first ever AI Qualify conference</a>. Join 1000+ other highly informed attendees to connect and learn about how you can keep grom letting your AI efforts go awry.</p><p>Hear speakers from Uber, Groq, Cruise, Torc Robotics, Notion, Anthropic, Open AI, Google and 20+ more organizations.</p><p>Use the special discount code &#8216;hallucinate&#8217; for 30% off the ticket price.</p><div><hr></div><h2><strong>News Highlight</strong>: A Bad Week for People Worried About AI</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QbZU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QbZU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QbZU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QbZU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QbZU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QbZU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg" width="1456" height="832" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QbZU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QbZU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QbZU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QbZU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f7cc89-b81e-4759-a719-e8b5c40485f5_1600x914.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>Two big pieces of news in the AI world last week:</p><ol><li><p>A bipartisan group of senators, including Senate leader Chuck Schumer (D-N.Y.), released a 20-page legislative roadmap for A.I.</p></li><li><p>On the heels of the company&#8217;s big GPT-4o release, OpenAI&#8217;s superalignment team dissolved, with various prominent employees including Ilya Sutskever and Jan Leike resigning.</p></li></ol><h4><strong>Overview</strong></h4><h5>AI Roadmap to Where?</h5><p>Schumer and the other members of the Senate AI Working Group&#8212;Todd Young (R-Ind.), Martin Heinrich (D-N.M.) and Mike Rounds (R-S.D.)&#8212;released a 20-page document titled &#8220;<a href="https://www.young.senate.gov/wp-content/uploads/Roadmap_Electronic1.32pm.pdf?utm_source=Center+for+Security+and+Emerging+Technology&amp;utm_campaign=b76b2a3f93-Newsletter_May_16_24&amp;utm_medium=email&amp;utm_term=0_-b76b2a3f93-%5BLIST_EMAIL_ID%5D">Driving U.S. Innovation in Artificial Intelligence</a>,&#8221; the culmination of a series of nine AI Insight Forums held last fall. The report identifies areas of consensus that the authors believe &#8220;merit bipartisan consideration in the Senate.&#8221; These include:</p><ul><li><p>At least $32 billion of annual non-defence funding for AI R&amp;D by FY2026</p></li><li><p>A &#8220;strong comprehensive federal data privacy law&#8221;</p></li><li><p>Legislation to increase high-skilled STEM immigration</p></li><li><p>The development of standards for use of AI in critical infrastructure</p></li><li><p>Full funding of previously authorized efforts, such as the CHIPS and Science Act</p></li></ul><p>Schumer stressed that the Working Group does not intend to compile their recommendations into a single bill and that relevant congressional committees should instead advance their own smaller bills in line with the report&#8217;s recommendations.</p><p>The roadmap was roundly rebuked by those hoping for more comprehensive regulation of AI. The roadmap&#8217;s primary focus seems to be on supporting AI innovation, and the recommendations that address AI harms like protections against health and financial discrimination, job displacement, and copyright are all fairly vague. Nik Marda, a technical lead at Mozilla, <a href="https://twitter.com/nrmarda/status/1790804978786848824">pointed out</a> that the roadmap references &#8220;bias&#8221; as many times as it references &#8220;space debris&#8221; (three times each). Dr. Suresh Venkatasubramanian, a co-author of the White House&#8217;s AI Bill of Rights, participated in the AI Insight Forums and reacted to the roadmap with disappointment, <a href="https://www.fastcompany.com/91126098/chuck-schumer-ai-roadmap-panned">telling </a><em><a href="https://www.fastcompany.com/91126098/chuck-schumer-ai-roadmap-panned">Fast Company</a> </em>&#8220;I think many people like myself were concerned whether this would be a dancing monkey show, and we&#8217;re the monkeys&#8230;I feel betrayed.&#8221; </p><p>We reached out to Dr. Venkatasubramanian for further comment and he elaborated that he was &#8220;disappointed that while researchers have been insisting for years now that the best way to innovate on AI in rights-impacting settings is to center people and society - in other words, conduct sociotechnical research on the way automated systems can be incorporated into our lives - the roadmap seems to take a very narrow technical approach to investments in AI research.&#8221;</p><h5>OpenAI Unalignment</h5><p>Last Tuesday, the day after the GPT-4o launch, OpenAI&#8217;s Chief Scientist Ilya Sutskever and the co-head of the superalignment team Jan Leike announced their departures from the company. The superalignment team, whose <a href="https://openai.com/superalignment/">goal</a> was to solve the core technical challenges needed to &#8220;steer and control AI systems much smarter than us&#8221;, had been <a href="https://www.vox.com/future-perfect/2024/5/17/24158403/openai-resignations-ai-safety-ilya-sutskever-jan-leike-artificial-intelligence">leaking researchers for several months</a> and by Friday <em>Wired </em><a href="https://www.wired.com/story/openai-superalignment-team-disbanded/">confirmed</a> that the entire team had been disbanded with all of its members having resigned or been absorbed into other research groups.</p><p>OpenAI <a href="https://x.com/KelseyTuoc/status/1791539443016536265">reportedly</a> has those departing sign non-disparagement agreements (at risk of losing their equity), but Leike did not hold back about his reasons for leaving in the <a href="https://twitter.com/janleike/status/1791498174659715494">thread</a> he posted to X on Friday, and it&#8217;s worth quoting at length: &#8220;I have been disagreeing with OpenAI leadership about the company's core priorities for quite some time, until we finally reached a breaking point. I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics. These problems are quite hard to get right, and I am concerned we aren't on a trajectory to get there. Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done. Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity. But over the past years, safety culture and processes have taken a backseat to shiny products.&#8221;</p><h4><strong>Our Take</strong></h4><p>A stylized way to think about AI concerns is that there are two types of people worried about AI:</p><ol><li><p><em>People who are worried about A.I. because of its limitations</em>. These people focus on the shortcomings of current A.I. systems and the harms arising from their deployment. They talk about things like algorithmic bias, and AI ethics, and the importance of keeping a human in the loop.</p></li><li><p><em>People who are worried about A.I. because of its capabilities</em>. These people focus on the rapid improvement in these systems and extrapolate them forward. They make direct analogies to human intelligence and talk about things like AGI, existential risk, and alignment.</p></li></ol><p>In practice there is a lot more nuance than this and most people (myself included)&nbsp; fall somewhere in the middle. But I think this is a useful framework for thinking about the two poles of the AI worry spectrum and making sense of this past week&#8217;s events.&nbsp;</p><p>Despite both being worried about AI these two groups do not see eye to eye. The <em>limitation</em> worriers see the problem as largely sociotechnical, emphasis on &#8220;socio&#8221;. The technology is inherently flawed and the solution is not better technology so much as better regulation of how people use the technology. Beyond getting incentive structures right (&#128064; OpenAI&#8217;s charter), <em>capabilities </em>worriers prioritize technical solutions: technical research got us into this pickle, technical research will get us out of this pickle. Alignment is a technical research effort not a matter of policy.</p><p>Neither group got their preferred outcomes last week. The alignment people can&#8217;t get the compute they want and the AI ethics people are no closer to regulation with teeth.</p><p>&#8212;Cole</p><h2><strong>Research Highlight: </strong><a href="https://arxiv.org/pdf/2405.07987">The Platonic Representation Hypothesis</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yBgv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f47f208-7718-485c-9279-7111ff1ce6d5_756x640.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yBgv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f47f208-7718-485c-9279-7111ff1ce6d5_756x640.png 424w, https://substackcdn.com/image/fetch/$s_!yBgv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f47f208-7718-485c-9279-7111ff1ce6d5_756x640.png 848w, https://substackcdn.com/image/fetch/$s_!yBgv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f47f208-7718-485c-9279-7111ff1ce6d5_756x640.png 1272w, https://substackcdn.com/image/fetch/$s_!yBgv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f47f208-7718-485c-9279-7111ff1ce6d5_756x640.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yBgv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f47f208-7718-485c-9279-7111ff1ce6d5_756x640.png" width="756" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f47f208-7718-485c-9279-7111ff1ce6d5_756x640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:756,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yBgv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f47f208-7718-485c-9279-7111ff1ce6d5_756x640.png 424w, https://substackcdn.com/image/fetch/$s_!yBgv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f47f208-7718-485c-9279-7111ff1ce6d5_756x640.png 848w, https://substackcdn.com/image/fetch/$s_!yBgv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f47f208-7718-485c-9279-7111ff1ce6d5_756x640.png 1272w, https://substackcdn.com/image/fetch/$s_!yBgv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f47f208-7718-485c-9279-7111ff1ce6d5_756x640.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>Researchers from MIT argue that representations in AI models trained with different objectives on different datasets and modalities are converging. <a href="https://arxiv.org/pdf/2405.07987">Huh et al.</a> demonstrate this convergence by measuring the distance/similarity between datapoints and comparing the similarity structures induced by different representations. They find that as vision models and language models get larger and more competent, the ways they measure distance between datapoints become more and more alike. This leads to the hypothesis that neural networks are converging toward a shared statistical model of reality, a <em>"platonic representation" </em>in reference to Plato's <a href="https://faculty.tamuc.edu/jherndon/documents/plato.pdf">Allegory of the Cave</a> and his conception of an ideal reality that underlies what we perceive.</p><h4><strong>Overview</strong>&nbsp;</h4><p>The authors consider vector embedding representations and argue that different neural networks are converging to aligned representations. They measure alignment through a mutual k-nearest-neighbor metric, defined as the average size of the overlap of nearest neighbor sets induced by two representations through inner product. Evaluating the transfer performance of 78 vision models on the <a href="https://research.google/blog/the-visual-task-adaptation-benchmark/">VTAB</a> dataset, they find that models with high performance have high representation alignment while models with weak performance have more variable representations, which leads them to conclude "all strong models are alike, [while] each weak model is weak in its own way."&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Uxf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb555f5d2-46ae-464f-9689-b6ecad9e1a29_1080x554.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Uxf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb555f5d2-46ae-464f-9689-b6ecad9e1a29_1080x554.png 424w, https://substackcdn.com/image/fetch/$s_!5Uxf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb555f5d2-46ae-464f-9689-b6ecad9e1a29_1080x554.png 848w, https://substackcdn.com/image/fetch/$s_!5Uxf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb555f5d2-46ae-464f-9689-b6ecad9e1a29_1080x554.png 1272w, https://substackcdn.com/image/fetch/$s_!5Uxf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb555f5d2-46ae-464f-9689-b6ecad9e1a29_1080x554.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Uxf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb555f5d2-46ae-464f-9689-b6ecad9e1a29_1080x554.png" width="1080" height="554" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b555f5d2-46ae-464f-9689-b6ecad9e1a29_1080x554.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:554,&quot;width&quot;:1080,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Uxf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb555f5d2-46ae-464f-9689-b6ecad9e1a29_1080x554.png 424w, https://substackcdn.com/image/fetch/$s_!5Uxf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb555f5d2-46ae-464f-9689-b6ecad9e1a29_1080x554.png 848w, https://substackcdn.com/image/fetch/$s_!5Uxf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb555f5d2-46ae-464f-9689-b6ecad9e1a29_1080x554.png 1272w, https://substackcdn.com/image/fetch/$s_!5Uxf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb555f5d2-46ae-464f-9689-b6ecad9e1a29_1080x554.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That alignment increases with scale and performance holds true across modalities as well. Using the <a href="https://arxiv.org/abs/2103.01913">Wikipedia captions</a> dataset made of images and their corresponding captions, the authors find that the better an LLM is at language modeling, the more it tends to align with vision models, and vice versa. Further, language models that are more closely aligned with vision also demonstrate better downstream performance on tasks such as commonsense reasoning and mathematical problem solving.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9d1m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F328d834e-46ec-495f-9f43-1b0ecbdd325c_880x670.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9d1m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F328d834e-46ec-495f-9f43-1b0ecbdd325c_880x670.png 424w, https://substackcdn.com/image/fetch/$s_!9d1m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F328d834e-46ec-495f-9f43-1b0ecbdd325c_880x670.png 848w, https://substackcdn.com/image/fetch/$s_!9d1m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F328d834e-46ec-495f-9f43-1b0ecbdd325c_880x670.png 1272w, https://substackcdn.com/image/fetch/$s_!9d1m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F328d834e-46ec-495f-9f43-1b0ecbdd325c_880x670.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9d1m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F328d834e-46ec-495f-9f43-1b0ecbdd325c_880x670.png" width="654" height="497.9318181818182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/328d834e-46ec-495f-9f43-1b0ecbdd325c_880x670.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:880,&quot;resizeWidth&quot;:654,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9d1m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F328d834e-46ec-495f-9f43-1b0ecbdd325c_880x670.png 424w, https://substackcdn.com/image/fetch/$s_!9d1m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F328d834e-46ec-495f-9f43-1b0ecbdd325c_880x670.png 848w, https://substackcdn.com/image/fetch/$s_!9d1m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F328d834e-46ec-495f-9f43-1b0ecbdd325c_880x670.png 1272w, https://substackcdn.com/image/fetch/$s_!9d1m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F328d834e-46ec-495f-9f43-1b0ecbdd325c_880x670.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As to why representations are converging, the authors offer three hypotheses that correspond to three components of a common machine learning model formula:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sw-T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c269b9-a817-46fd-b453-83fc2c1f0f95_780x172.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sw-T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c269b9-a817-46fd-b453-83fc2c1f0f95_780x172.png 424w, https://substackcdn.com/image/fetch/$s_!sw-T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c269b9-a817-46fd-b453-83fc2c1f0f95_780x172.png 848w, https://substackcdn.com/image/fetch/$s_!sw-T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c269b9-a817-46fd-b453-83fc2c1f0f95_780x172.png 1272w, https://substackcdn.com/image/fetch/$s_!sw-T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c269b9-a817-46fd-b453-83fc2c1f0f95_780x172.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sw-T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c269b9-a817-46fd-b453-83fc2c1f0f95_780x172.png" width="780" height="172" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64c269b9-a817-46fd-b453-83fc2c1f0f95_780x172.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:172,&quot;width&quot;:780,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sw-T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c269b9-a817-46fd-b453-83fc2c1f0f95_780x172.png 424w, https://substackcdn.com/image/fetch/$s_!sw-T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c269b9-a817-46fd-b453-83fc2c1f0f95_780x172.png 848w, https://substackcdn.com/image/fetch/$s_!sw-T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c269b9-a817-46fd-b453-83fc2c1f0f95_780x172.png 1272w, https://substackcdn.com/image/fetch/$s_!sw-T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c269b9-a817-46fd-b453-83fc2c1f0f95_780x172.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>First, with sufficient data, scaling a model, i.e. using larger function classes F, as well as improved optimization, should be more effective at finding better approximations to a globally optimal representation. Second, there are fewer representations that are competent in multiple tasks as we scale data and tasks. Models that optimize the empirical risk (the expectation over observed data) also become better at capturing statistical structures of the true data generating process (the population risk). Third, deep networks favor simple solutions that fit the data, with or without explicit regularization. Bigger models have a stronger bias toward simplicity and thus converge to a smaller solution space.</p><p>But what exactly is this shared representation of underlying reality? The authors offer one concrete candidate. In an idealized world that consists of a sequence of discrete events, which can be observed in various ways (pixels, sounds, words, and so on), a family of contrastive learners that model co-occurring observations converge to the same pointwise mutual information (PMI) kernel, representing certain pairwise statistics of the unknown underlying distribution that generates the events. Their analysis suggests that certain representation learning algorithms may boil down to simply finding an embedding in which similarity equals PMI.&nbsp;</p><h4><strong>Our Take</strong></h4><p>What excites me the most about this representational convergence is the ability to share and use data from different modalities for training and inference. It also suggests that multimodal models are better than single-modal ones, given they are grounded in additional modalities and should represent the world in a way that's closer to what the world really is. On the other hand, it is not clear whether a 16% alignment (see one of the figures above) between a set of language and vision models is significant enough to qualify as "convergence." I'm also questioning whether this platonic representation, assuming it does exist, is the endpoint we should pursue, as opposed to what we <em>want </em>the world to be. But that's a whole other ethical debate.&nbsp;</p><p>&#8211; Jaymee</p><p>I am just waiting for the philosophy takes on this paper (because then we&#8217;ll REALLY have come full circle). But, before we come up with another project and set about trying to determine whether moral realism is a thing based on what AI models seem to be doing, we should probably take stock of a few aspects of what&#8217;s going on in the paper. I think a few interesting callouts are section 2.4, where the authors draw on the related point that neural networks seem to show substantial alignment with biological representations in the brain. I also think the three hypotheses presented in section 3 are useful intuition pumps: the <em>Multitask Scaling Hypothesis</em> says if we consider competency at some number of tasks, <em>N</em>, we should expect fewer representations to be competent for all <em>N</em> tasks as <em>N</em> grows larger. You might also expect that models with larger hypothesis spaces to be more likely to find an optimal representation, if one exists in function space&#8212;the authors call this the <em>Capacity Hypothesis</em>. Finally, the <em>Simplicity Bias Hypothesis</em> says deep networks are biased towards finding simple fits to the data, and larger models will have a stronger bias.&nbsp;</p><p>I think, if you buy what&#8217;s being said here, the &#8220;our current paradigm is not very efficient&#8221; point becomes something like: fairly general architectures without strong inductive biases towards certain sorts of representations (beyond the simplicity bias), scaled up enough and trained to solve a general enough task(s), will have large enough hypothesis spaces that they&#8217;ll eventually be pressured to find optimal representations for their data. It is worth noting that datasets and tasks are structured by what we take to be useful and want models to do, and so while I think it might be perfectly fine to posit that there&#8217;s a representation (or representations) most useful for those things, calling it a &#8220;shared representation <em>of reality</em>&#8221; feels a bit grandiose (maybe I&#8217;m just being annoying. But, to be fair, you could be a <em>lot</em> more annoying about this paper if you really wanted. I&#8217;ll leave doing that as a take-home exercise&#8212;imagine you&#8217;re Reviewer 2 and have at it). If you&#8217;ve heard of projectivism&#8230; it seems reasonable to think that &#8220;representation of reality&#8221; might be projectivism. </p><p>All that said, this is a thoughtful and interesting paper. I like the counterexamples and limitations section the authors include at the end, and I think you should read it in full.&nbsp;</p><p>Also, for other takes on &#8220;universal&#8221; representations / representations useful for transfer learning, I had a <a href="https://thegradientpub.substack.com/p/hugo-larochelle-deep-learning">conversation</a> with Hugo Larochelle some time ago that went into a bunch of his work on this&#8212;I expect you&#8217;d find a number of his papers on the subject interesting if you liked this one.&nbsp;</p><p>&#8212;Daniel</p><h2>New from the Gradient</h2><h3><a href="https://thegradientpub.substack.com/p/suhail-doshi-playground-ai-computer-vision">Suhail Doshi: The Future of Computer Vision</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_WxT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d301ff-270b-4f67-8a8a-251753055b5c_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_WxT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d301ff-270b-4f67-8a8a-251753055b5c_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!_WxT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d301ff-270b-4f67-8a8a-251753055b5c_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!_WxT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d301ff-270b-4f67-8a8a-251753055b5c_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!_WxT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d301ff-270b-4f67-8a8a-251753055b5c_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_WxT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d301ff-270b-4f67-8a8a-251753055b5c_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95d301ff-270b-4f67-8a8a-251753055b5c_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_WxT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d301ff-270b-4f67-8a8a-251753055b5c_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!_WxT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d301ff-270b-4f67-8a8a-251753055b5c_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!_WxT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d301ff-270b-4f67-8a8a-251753055b5c_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!_WxT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d301ff-270b-4f67-8a8a-251753055b5c_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/suhail-doshi-playground-ai-computer-vision&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/suhail-doshi-playground-ai-computer-vision"><span>Listen</span></a></p><h3><a href="https://thegradientpub.substack.com/p/azeem-azhar-the-exponential-view">Azeem Azhar: The Exponential View</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RunM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa39c839-0f2e-45e4-a6fb-9b82a4207927_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RunM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa39c839-0f2e-45e4-a6fb-9b82a4207927_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!RunM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa39c839-0f2e-45e4-a6fb-9b82a4207927_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!RunM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa39c839-0f2e-45e4-a6fb-9b82a4207927_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!RunM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa39c839-0f2e-45e4-a6fb-9b82a4207927_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RunM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa39c839-0f2e-45e4-a6fb-9b82a4207927_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa39c839-0f2e-45e4-a6fb-9b82a4207927_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RunM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa39c839-0f2e-45e4-a6fb-9b82a4207927_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!RunM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa39c839-0f2e-45e4-a6fb-9b82a4207927_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!RunM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa39c839-0f2e-45e4-a6fb-9b82a4207927_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!RunM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa39c839-0f2e-45e4-a6fb-9b82a4207927_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/azeem-azhar-the-exponential-view&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/azeem-azhar-the-exponential-view"><span>Listen</span></a></p><h2>Other Things That Caught Our Eyes</h2><h3>News</h3><p><strong><a href="https://techcrunch.com/2024/05/07/openai-says-its-building-a-tool-to-let-content-creators-opt-out-of-ai-training/">OpenAI says it&#8217;s building a tool to let content creators &#8216;opt out&#8217; of AI training</a></strong></p><p>OpenAI is developing a tool called Media Manager that will allow content creators to have more control over how their works are used in training generative AI. The tool will enable creators to identify their works and specify whether they want them to be included or excluded from AI research and training. OpenAI plans to have the tool in place by 2025 and is working with creators, content owners, and regulators to establish a standard. The goal is to build a cutting-edge machine learning tool that can identify copyrighted text, images, audio, and video across multiple sources and reflect creator preferences. This initiative is a response to criticism of OpenAI's approach to AI development, which involves scraping publicly available data from the web. OpenAI has faced lawsuits for IP infringement and has taken steps to address concerns, such as allowing artists to opt out of using their work and signing licensing deals with content owners. However, some content creators believe OpenAI's efforts are insufficient.&nbsp;</p><p><strong><a href="https://www.washingtonpost.com/technology/2024/05/07/apple-new-ipad-pro-m4-ai/">Apple plays up AI potential in new iPads</a></strong></p><p>Apple has introduced new iPads that highlight their potential for artificial intelligence (AI) features. The new Pro models come with a more powerful "neural engine" in the M4 chipset, which drives AI and machine learning features in third-party apps and Apple's own software. The neural engine in the M4 chipset is capable of performing 38 trillion operations per second, more than double the operations per second of the previous M3 chipset. This move by Apple is seen as an effort to catch up with rivals in the AI race.</p><p><strong><a href="https://www.washingtonpost.com/politics/2024/05/08/arizona-election-workers-trained-with-deepfakes-prepare-2024/">In Arizona, election workers trained with deepfakes to prepare for 2024</a></strong></p><p>The article discusses a unique training exercise conducted in Arizona to prepare election workers for potential deepfake attacks in the 2024 elections. Arizona Secretary of State Adrian Fontes addressed the election workers in a video message, emphasizing the importance of their role and the need to be prepared for new challenges. The training aimed to familiarize the workers with deepfake technology, which can create realistic but false videos or audio recordings. By experiencing and identifying deepfakes, the election workers can enhance their skills in detecting and mitigating potential threats.&nbsp;</p><p><strong><a href="https://www.lawweekcolorado.com/article/three-bills-in-colorado-look-to-put-some-guardrails-on-artificial-intelligence/">Three Bills in Colorado Look to Put Some Guardrails on Artificial Intelligence</a></strong></p><p>Colorado lawmakers are working on three bills to regulate the use of artificial intelligence (AI). The first bill, House Bill 24-1147, aims to regulate the use of deepfakes produced using generative AI in communications about candidates for elective office. It adds a disclosure requirement to their use and creates a civil private right to action when that requirement isn't met. The second bill, Senate Bill 24-205, focuses on protecting consumers from algorithmic discrimination in high-risk AI systems. It defines algorithmic discrimination as any condition where the use of AI results in unlawful differential treatment based on protected characteristics. The third bill, House Bill 24-1468, updates the membership and issues of study for the task force on facial recognition services, expanding its scope to include biometric technology and AI. These bills aim to provide guardrails and protections in the use of AI.</p><p><strong><a href="https://www.wired.com/story/openai-is-exploring-how-to-responsibly-generate-ai-porn/">OpenAI Is &#8216;Exploring&#8217; How to Responsibly Generate AI Porn</a></strong></p><p>OpenAI, the company behind ChatGPT and other AI technologies, is exploring the possibility of responsibly generating NSFW (not safe for work) content, including porn and explicit materials. While OpenAI's current usage policies prohibit sexually explicit or suggestive content, the company is considering how to permit such content in age-appropriate contexts. The Model Spec document mentions that NSFW content may include erotica, extreme gore, slurs, and unsolicited profanity. OpenAI aims to better understand user and societal expectations in this area. However, the company emphasizes that it does not intend for its models to generate AI porn. The potential for AI-generated pornography raises concerns about privacy violations and nonconsensual use of synthesized intimate images. OpenAI's exploration of explicit content generation and its moderation to prevent misuse by bad actors remain important considerations.</p><p><strong><a href="https://techcrunch.com/2024/05/09/tiktok-automatically-label-ai-generated-content-created-other-platforms/">TikTok will automatically label AI-generated content created on platforms like DALL&#183;E 3</a></strong></p><p>TikTok has announced that it will automatically label AI-generated content created on other platforms. The company will use Content Credentials, a technology developed by the Coalition for Content Provenance and Authenticity (C2PA), to attach specific metadata to AI-generated content. This metadata will allow TikTok to recognize and label AI-generated content instantly. The change is rolling out globally and will apply to all users in the coming weeks. TikTok already labels content created with its own AI effects, but this new feature will extend to content created on other platforms that have implemented Content Credentials. TikTok is the first video-sharing platform to implement this technology. The goal is to ensure transparency for viewers and to deter harmful or misleading AI-generated content. The company is also committed to combating deceptive AI in elections.</p><p><strong><a href="https://www.wired.com/story/arati-prabhakar-ostp-biden-science-tech-adviser/">Meet the Woman Who Showed President Biden ChatGPT&#8212;and Helped Set the Course for AI</a></strong></p><p>Arati Prabhakar, the director of the White House Office of Science and Technology Policy, played a crucial role in demonstrating the potential of AI to President Biden. This led to the issuance of a comprehensive executive order that sets a regulatory course for the AI industry. Prabhakar, who has a background in applied physics and experience in Silicon Valley, has been educating top officials about the transformative power of AI. The executive order mandates safety standards, promotes innovation, and addresses job losses. Prabhakar is the first person of color and first woman to hold the position of director of the office.&nbsp;</p><p><strong><a href="https://apnews.com/article/amazon-autonomous-vehicle-investigation-crashes-zoox-45c53600710407bc6f82b0a855c46e12">Amazon&#8217;s self-driving robotaxi unit Zoox under investigation by US after 2 rear-end crashes</a></strong></p><p>The U.S. National Highway Traffic Safety Administration (NHTSA) is investigating Amazon's self-driving robotaxi unit, Zoox, after two of its vehicles were involved in rear-end crashes. The crashes occurred in San Francisco and Spring Valley, Nevada, and both involved Toyota Highlander SUVs equipped with autonomous driving technology. The NHTSA will evaluate Zoox's automated driving system and its performance during the crashes, as well as its behavior around pedestrians and other vulnerable road users. Zoox has stated that it is committed to working with the NHTSA and that the vehicles had human safety drivers on board. This investigation comes after a previous investigation into Zoox's certification of meeting federal safety standards.&nbsp;</p><p><strong><a href="https://www.technologyreview.com/2024/05/07/1092116/deepfakes-dead-chinese-business-grief/">Deepfakes of your dead loved ones are a booming Chinese business</a></strong></p><p>The Chinese market for AI avatars of deceased loved ones is booming, with several companies offering the technology and thousands of people already paying for it. These avatars, created using deepfake technology, aim to preserve and interact with lost loved ones. While the technology is not perfect, it is maturing and becoming more accessible to the general public. However, there are concerns about the ethical and legal implications of interacting with AI replicas of the dead. Despite this, the market potential is significant, even if only a small percentage of Chinese people accept this technology. The Chinese sector developing AI avatars has rapidly matured in the past three years, with avatars improving from rendered videos to 3D "live" avatars that can interact with people.&nbsp;</p><p><strong><a href="https://www.washingtonpost.com/technology/2024/05/12/ai-deepfakes-detection-industry/?mc_cid=0e5a81bd72&amp;mc_eid=e5d7cd9db3">Fooled by AI? These firms sell deepfake detection that&#8217;s &#8216;REAL 100%.&#8217;</a></strong></p><p>A Bay Area start-up has gained attention for its ability to detect deepfakes with 99% accuracy. The company has secured several military contracts, including a $1.25 million deal with the Air Force to develop a custom detector for countering Russian and Chinese information warfare. The CEO of the company recently testified before a Senate subcommittee about the threat that AI deepfakes pose to U.S. elections. This highlights the growing concern over the use of AI-generated fake images, audio, and video, and the need for effective detection methods.</p><p><strong><a href="https://jacobin.com/2024/05/daing-apps-artificial-intelligence-love/">No, We Don&#8217;t Need AI to Go on Dates for Us</a></strong></p><p>The article discusses the idea of using AI concierges in dating apps to screen potential matches and recommend the best ones for users to meet. The author highlights the anthropomorphic element of these AI technologies and emphasizes that they are not necessarily new or emerging, but rather a reflection of the computational capabilities that already exist. The introduction of AI concierges in dating apps raises concerns about the potential for discrimination and social stratification. The author points out that while the algorithm may advocate for the interests of its user, it ultimately discriminates against each user along various dimensions. The article also mentions the use of credit-checking and performance ranking systems in Chinese dating apps, which reflect and reinforce social stratification.&nbsp;</p><p><strong><a href="https://www.theverge.com/2024/5/14/24155927/google-ai-synthid-watermark-text-video-io">Google&#8217;s invisible AI watermark will help identify generative text and video</a></strong></p><p>Google is expanding its AI content watermarking and detection technology to include video and AI-generated text. The upgraded SynthID watermark imprinting system can now mark digitally generated video and AI-generated text. This is important as AI technology becomes more prevalent and can be used for malicious purposes such as spreading misinformation and creating nonconsensual content. SynthID was initially developed to imprint AI imagery that is undecipherable to humans but detectable by the system. Google has also used SynthID to inject inaudible watermarks into AI-generated music. This is part of Google's efforts to develop AI safeguards to combat misuse of the technology. The Biden administration is also directing federal agencies to create guidelines around these safeguards.</p><p><strong><a href="https://news.microsoft.com/fr-fr/2024/05/13/microsoft-announces-the-largest-investment-to-date-in-france-to-accelerate-the-adoption-of-ai-skilling-and-innovation/">Microsoft announces the largest investment to date in France to accelerate the adoption of AI, skilling and innovation</a></strong></p><p>Microsoft has announced its largest investment in France to date, with a focus on accelerating the adoption of artificial intelligence (AI), skilling, and innovation. The investment includes building advanced cloud and AI infrastructure, providing AI training to individuals, and supporting French startups in utilizing Microsoft technology. This investment showcases Microsoft's commitment to supporting digital innovation and economic growth in France.&nbsp;</p><p><strong><a href="https://www.nytimes.com/2024/05/16/technology/ai-voice-clone-lawsuit.html">What Do You Do When A.I. Takes Your Voice?</a></strong></p><p>In this article, the author discusses the rising threat of artificial intelligence (A.I.) to the livelihoods of writers, actors, and other entertainment professionals. The article highlights a podcast that the couple, Paul Skye Lehrman and Linnea Sage, listened to, which featured an interview with a talking chatbot named Poe that sounded just like Mr. Lehrman. This unexpected twist emphasized the potential harm that A.I. could have on the entertainment industry. The couple was left in disbelief and unsure of how to respond to this situation.&nbsp;</p><h3>Papers</h3><p><strong>Daniel</strong>: I have been sick and so this set of recommendations is probably going to just be a list. <a href="https://gradientscience.org/contextcite/">ContextCite</a> is a new method (with, as of this part, a demo and code, but no paper!) which traces part of a LM-generated response back to a piece of the model&#8217;s context. The authors distinguish <em>corroborative</em> attribution (identifying sources that support or imply a statement) form <em>contributive</em> attribution (identifying the sources that cause a model to generate a statement)&#8212;methods for corroborative attribution of LMs exist, so it&#8217;s the latter sort of attribution that ContextCite provides. The method relies on the intuition that if a source is important to a model&#8217;s generation, removing that source should change the generated content significantly. So, after generating a response for a given context and query, the authors: (1) randomly ablate sources in the context to exclude and compute the probability of generating the original response for each ablation mask; (2) fit a surrogate model to this &#8220;training dataset&#8221; to estimate the probability of generating the original response as a function of the ablation mask.&nbsp;</p><p><a href="https://arxiv.org/abs/2404.05405">This paper</a> from about a month ago presents 12 results on how (1) training duration, (2) model architecture, (3) quantization, (4) sparsity constraints, and (5) data signal-to-noise ratio affect a model&#8217;s knowledge storage capacity.&nbsp;</p><p>I also like the paper &#8220;<a href="https://arxiv.org/pdf/2403.05812">Algorithmic Progress in Language Models</a>,&#8221; which takes a look at the rate of improvement for language model pre-training algorithms, and finds that (a) the compute required to reach a set performance threshold has halved about every 8 months, and (b) despite algorithmic and architectural progress, increases in compute made an even larger contribution to overall performance improvements from 2012-2023.&nbsp;</p><p>Lastly, you might find this <a href="https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/introducing-the-frontier-safety-framework/fsf-technical-report.pdf">frontier safety framework</a> out of DeepMind interesting.&nbsp;</p><h3>Closing Thoughts</h3><p>Have something to say about this edition&#8217;s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at <a href="mailto:dbashir@hmc.edu">dbashir@hmc.edu</a> or on <a href="https://twitter.com/spaniel_bashir">Twitter</a>. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!</p>]]></content:encoded></item><item><title><![CDATA[ Mini-Update #41: OpenAI Unveils GPT-4o and Online Iterative RLHF]]></title><description><![CDATA[OpenAI updates its flagship model GPT-4 with multimodal aspects and real-time response speed and online iterative RLHF techniques prove strong for open-source models.]]></description><link>https://thegradientpub.substack.com/p/mini-update-41-openai-unveils-gpt</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/mini-update-41-openai-unveils-gpt</guid><dc:creator><![CDATA[Ather Fawaz]]></dc:creator><pubDate>Thu, 16 May 2024 03:00:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mWdg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7216ae-f63c-4419-81fe-e0c2b722b1a3_1600x893.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 41st Mini-Update from the Gradient! This is our exclusive newsletter edition specifically for paying subscribers and is our way to show you our appreciation for your support.</p>
      <p>
          <a href="https://thegradientpub.substack.com/p/mini-update-41-openai-unveils-gpt">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Update #74: Detecting Postpartum Depression and Kolmogorov-Arnold Networks]]></title><description><![CDATA[We look at Dionysus Digital Health's new ML system for detecting postpartum depression in expectant or new mothers; Kolmogorov-Arnold Networks are getting a lot of hype.]]></description><link>https://thegradientpub.substack.com/p/update-74-postpartum-kolmogorov-arnold-networks</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/update-74-postpartum-kolmogorov-arnold-networks</guid><dc:creator><![CDATA[daniel bashir]]></dc:creator><pubDate>Tue, 07 May 2024 15:30:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8495b29-1361-4e92-8215-7349f9e17b75_540x378.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 74th update from the Gradient! If you&#8217;re new and like what you see, <a href="https://thegradientpub.substack.com/">subscribe</a> and follow us on <a href="https://twitter.com/gradientpub">Twitter</a>. <strong>Our newsletters run long, so you&#8217;ll need to view this post on Substack to see everything!</strong></p><h2>Editor Notes</h2><p>Happy Tuesday. In possibly interesting news, Sam Altman recently gave a talk at Stanford, and students were lining up in droves. I didn&#8217;t drive up to Stanford to see the line or the talk, but did some &#8220;investigative journalism&#8221; by questioning a representative sample of the Stanford population to find out more about what was going on (I think there are now recordings of this talk). Here&#8217;s what I learned:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k22y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cd79f5-5a19-4795-9edc-ba42b63e3e76_1600x1073.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k22y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cd79f5-5a19-4795-9edc-ba42b63e3e76_1600x1073.png 424w, https://substackcdn.com/image/fetch/$s_!k22y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cd79f5-5a19-4795-9edc-ba42b63e3e76_1600x1073.png 848w, https://substackcdn.com/image/fetch/$s_!k22y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cd79f5-5a19-4795-9edc-ba42b63e3e76_1600x1073.png 1272w, https://substackcdn.com/image/fetch/$s_!k22y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cd79f5-5a19-4795-9edc-ba42b63e3e76_1600x1073.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k22y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cd79f5-5a19-4795-9edc-ba42b63e3e76_1600x1073.png" width="658" height="441.0769230769231" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d2cd79f5-5a19-4795-9edc-ba42b63e3e76_1600x1073.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:976,&quot;width&quot;:1456,&quot;resizeWidth&quot;:658,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k22y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cd79f5-5a19-4795-9edc-ba42b63e3e76_1600x1073.png 424w, https://substackcdn.com/image/fetch/$s_!k22y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cd79f5-5a19-4795-9edc-ba42b63e3e76_1600x1073.png 848w, https://substackcdn.com/image/fetch/$s_!k22y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cd79f5-5a19-4795-9edc-ba42b63e3e76_1600x1073.png 1272w, https://substackcdn.com/image/fetch/$s_!k22y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cd79f5-5a19-4795-9edc-ba42b63e3e76_1600x1073.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Of course, there was more.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8MIq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03767714-8890-42a8-bbef-3554f855c87f_1290x1230.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8MIq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03767714-8890-42a8-bbef-3554f855c87f_1290x1230.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8MIq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03767714-8890-42a8-bbef-3554f855c87f_1290x1230.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8MIq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03767714-8890-42a8-bbef-3554f855c87f_1290x1230.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8MIq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03767714-8890-42a8-bbef-3554f855c87f_1290x1230.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8MIq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03767714-8890-42a8-bbef-3554f855c87f_1290x1230.jpeg" width="396" height="377.5813953488372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03767714-8890-42a8-bbef-3554f855c87f_1290x1230.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1230,&quot;width&quot;:1290,&quot;resizeWidth&quot;:396,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8MIq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03767714-8890-42a8-bbef-3554f855c87f_1290x1230.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8MIq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03767714-8890-42a8-bbef-3554f855c87f_1290x1230.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8MIq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03767714-8890-42a8-bbef-3554f855c87f_1290x1230.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8MIq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03767714-8890-42a8-bbef-3554f855c87f_1290x1230.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Also notably, both our news and research highlights this week are about things to be skeptical of: diagnosing postpartum depression is pretty hard, as it turns out, and the jury is still very much out on KAN (see the first author&#8217;s own <a href="https://github.com/KindXiaoming/pykan?tab=readme-ov-file#authors-note">comments</a>, which I also linked below).&nbsp;</p><p>I also really enjoyed doing the last two podcast episodes&#8212;Ryan Tibshirani is incredibly thoughtful, and I love that he sometimes pursues problems for the sake of their beauty; I don&#8217;t know how he strikes the balance he does in his research, but it&#8217;s admirable. David Thorstad&#8217;s bounded rationality program is important, I think that his arguments against longtermism come from a principled place, and I also respect that the longtermism community funds and responds to his critical work. Whatever you think about longtermism, it&#8217;s hard to deny that the community pushes for the changes they think best. </p><p>Finally, <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Sebastian Raschka, PhD&quot;,&quot;id&quot;:27393275,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F61f4c017-506f-4e9b-a24f-76340dad0309_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;c65da876-cbb2-4861-aa03-672223087370&quot;}" data-component-name="MentionToDOM"></span>, a brilliant ML educator and someone the Substack AI community is lucky to have, recently published his new book, <em><a href="https://sebastianraschka.com/books/#machine-learning-q-and-ai">Machine Learning Q and AI</a></em>. If you&#8217;re looking for a straightforward read as a refresher on concepts, it&#8217;s nicely written. </p><p>As usual, if you want to write with us, send a pitch using <a href="https://goo.gl/forms/whYRKEzMZJox6FaH2">this form</a>.</p><div><hr></div><p><strong>Can't afford to test in production? Come to The world&#8217;s first AI Quality conference</strong></p><p>On June 25th in San Francisco, <a href="https://www.aiqualityconference.com/">join the first ever AI Qualify conference</a>. Join 1000+ other highly informed attendees to connect and learn about how you can keep grom letting your AI efforts go awry.</p><p>Hear speakers from Uber, Groq, Cruise, Torc Robotics, Notion, Anthropic, Open AI, Google and 20+ more organizations.</p><p>Use the special discount code 'hallucinate' for 30% off the ticket price.</p><div><hr></div><h2><strong>News Highlight</strong>: <a href="https://www.washingtonpost.com/technology/2024/04/29/ai-healthcare-postpartum-depression-screening/">Dionysus claims its blood test can detect postpartum depression. How much of a difference can this make?</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bnix!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8495b29-1361-4e92-8215-7349f9e17b75_540x378.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bnix!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8495b29-1361-4e92-8215-7349f9e17b75_540x378.png 424w, https://substackcdn.com/image/fetch/$s_!Bnix!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8495b29-1361-4e92-8215-7349f9e17b75_540x378.png 848w, https://substackcdn.com/image/fetch/$s_!Bnix!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8495b29-1361-4e92-8215-7349f9e17b75_540x378.png 1272w, https://substackcdn.com/image/fetch/$s_!Bnix!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8495b29-1361-4e92-8215-7349f9e17b75_540x378.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bnix!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8495b29-1361-4e92-8215-7349f9e17b75_540x378.png" width="540" height="378" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8495b29-1361-4e92-8215-7349f9e17b75_540x378.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:378,&quot;width&quot;:540,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bnix!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8495b29-1361-4e92-8215-7349f9e17b75_540x378.png 424w, https://substackcdn.com/image/fetch/$s_!Bnix!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8495b29-1361-4e92-8215-7349f9e17b75_540x378.png 848w, https://substackcdn.com/image/fetch/$s_!Bnix!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8495b29-1361-4e92-8215-7349f9e17b75_540x378.png 1272w, https://substackcdn.com/image/fetch/$s_!Bnix!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8495b29-1361-4e92-8215-7349f9e17b75_540x378.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>San Diego-based startup Dionysus Digital Health is pitching a blood test to pitch for postpartum depression, even before symptoms appear. Such a system&#8212;if it worked in the way Dionysus imagines&#8212;could help healthcare systems funnel vulnerable mothers toward treatment, and even preventative care.&nbsp;</p><h4><strong>Overview</strong></h4><p>Postpartum depression, or PPD, is unfortunately common for women who have recently given birth to children&#8212;the CDC <a href="https://www.cdc.gov/reproductivehealth/depression/index.htm#Postpartum">says</a> about one in 8 new mothers experience symptoms (the NCBI <a href="https://www.ncbi.nlm.nih.gov/books/NBK519070/">says</a> one in 7). Dionysus claims it has pinpointed a gene linking moods to hormonal changes&#8212;their ML system can compare gene expression in blood samples to determine if a mother might develop depression symptoms.&nbsp;</p><p>This is the picture Dionysus imagines: providers administer a blood test between the second and third trimesters of pregnancy; that blood test flags women at risk of postpartum depression or other disorders; this, and other methods, funnel vulnerable women towards appropriate treatment and care.&nbsp;</p><p>The reality of current medical care is that women are not always screened for postpartum depression during and after pregnancy, despite recommendations. Tools like Dionysus&#8217;s could make it easier to identify at-risk mothers.&nbsp;</p><p>Of course, such a tool doesn&#8217;t exist in a vacuum, and the usual cast of concerns has been raised: bias and cost are two considerations; identifying at-risk mothers alone also makes little difference if those mothers can&#8217;t access care.&nbsp;</p><h4><strong>Our Take</strong></h4><p>I think this is an interesting story because it highlights the ever-present lesson that AI systems don&#8217;t exist in a vacuum, and there is a large gap between solving a problem with an ML system&#8212;an important problem, admittedly&#8212;and leveraging that system to make a real-world impact.&nbsp;</p><p>What I find particularly interesting is that Dionysus says they want to sell their postpartum depression test <em>directly to consumers</em>. And, there&#8217;s more than postpartum depression screening: they <a href="https://dionysushealth.com/about-us-ppd-draft/">want</a> to &#8220;create bite sized feedback unique to you&#8221; and provide actionable tools based on their epigenetic insights. The idea of personalized wellness support isn&#8217;t an entirely new one, and interacts with some of the bias concerns that have been raised.&nbsp;</p><p>But, all the signs (and Dionysus&#8217;s own messaging) point to just the kind of consumer product that makes people worry about inequity: if consumers need to pay out of pocket for Dionysus&#8217;s technology <em>and</em> it isn&#8217;t broadly affordable, there&#8217;s a clear exacerbating factor.&nbsp;</p><p>Even then, suppose Dionysus&#8217;s system works and most people are able to use it&#8212;there are a number of non-genetic factors that can also predispose expectant or new mothers to postpartum depression. These are factors that a blood test alone can&#8217;t take into account, and would require more extensive familiarity with a patient&#8217;s situation (or something like a survey, in which case the system&#8217;s ease of detection when compared against older screening methods might not be such a big improvement).&nbsp;</p><p>Furthermore, even women who are diagnosed with postpartum depression rarely receive the care they need&#8212;<a href="https://pubmed.ncbi.nlm.nih.gov/10992206/">this study</a> says just one-third of pregnant mothers with signs of mental disorders received treatment (this treatment included &#8220;reassurance&#8221; from providers). All this is to say: I think Dionysus&#8217;s vision is a nice one and one worth building towards. But we shouldn&#8217;t forget that building ML systems alone isn&#8217;t going to bring that world about. We need to build better healthcare systems and figure out ways to offer people higher-quality, more consistent care. Again, and again, and again, let&#8217;s not make the mistake of techno-solutionism.&nbsp;</p><p>&#8212;Daniel</p><p>I am having war flashbacks to Theranos and am wondering aloud have we learned nothing? While helping those who suffer from postpartum depression is an extremely noble aim, do we have any reliable indication besides their word that Dionysus is capable of reliably detecting postpartum depression? <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4230951/">Some</a> clinical research has found that expert doctors can only diagnose depression with a recall of 50%; meaning <strong>half of all patients feeling depressed would be misdiagnosed by experts</strong>. This suggests that even if Dionysus has a magical AI that can predict their training labels well, the underlying process for label generation (doctors diagnosing people with depression) is incredibly flawed and would on average be mislabeling approximately half of the depressed population as not depressed.&nbsp;</p><p>-Justin&nbsp;</p><h2><strong>Research Highlight: </strong><a href="https://arxiv.org/pdf/2404.19756">KAN: Kolmogorov-Arnold Networks</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-ND_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39dde1a8-5eb8-46ae-9400-3e81be5f5a08_1600x1030.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-ND_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39dde1a8-5eb8-46ae-9400-3e81be5f5a08_1600x1030.png 424w, https://substackcdn.com/image/fetch/$s_!-ND_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39dde1a8-5eb8-46ae-9400-3e81be5f5a08_1600x1030.png 848w, https://substackcdn.com/image/fetch/$s_!-ND_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39dde1a8-5eb8-46ae-9400-3e81be5f5a08_1600x1030.png 1272w, https://substackcdn.com/image/fetch/$s_!-ND_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39dde1a8-5eb8-46ae-9400-3e81be5f5a08_1600x1030.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-ND_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39dde1a8-5eb8-46ae-9400-3e81be5f5a08_1600x1030.png" width="1456" height="937" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/39dde1a8-5eb8-46ae-9400-3e81be5f5a08_1600x1030.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:937,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-ND_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39dde1a8-5eb8-46ae-9400-3e81be5f5a08_1600x1030.png 424w, https://substackcdn.com/image/fetch/$s_!-ND_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39dde1a8-5eb8-46ae-9400-3e81be5f5a08_1600x1030.png 848w, https://substackcdn.com/image/fetch/$s_!-ND_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39dde1a8-5eb8-46ae-9400-3e81be5f5a08_1600x1030.png 1272w, https://substackcdn.com/image/fetch/$s_!-ND_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39dde1a8-5eb8-46ae-9400-3e81be5f5a08_1600x1030.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>An exciting new <a href="https://arxiv.org/pdf/2404.19756">paper</a> by researchers at MIT, Caltech, and Northeastern presents Kolmogorov-Arnold Networks (KANs) as an alternative to the traditional Multi-Layer Perceptron (MLP) neural network architecture. The key differentiator between KANs and MLPs come from KANs having learnable activation functions on the <em>edges</em> between nodes, while MLPs have predetermined activation functions on the nodes themselves. The authors achieve this by replacing all linear weights with a univariate function parameterized as a B-spline (a gorgeous explainer on B-splines can be read <a href="https://sscardapane.notion.site/Kolmogorov-Arnold-Networks-KANs-b3749e1fd48d4bfdb78f5b05d45b5f1b">here</a>). The researchers go on to empirically test KANs on numerous data fitting and PDE (Partial Differential Equations) problems finding that KANs have significant gains in accuracy and interpretability over MLPs. The researchers go on to empirically show KANs can learn and represent numerous laws of nature and physical constraints while also exploring various drawbacks like slow training times.</p><h4><strong>Overview</strong>&nbsp;</h4><p>While MLPs take their theoretical inspiration from the <a href="https://en.wikipedia.org/wiki/Universal_approximation_theorem">Universal Approximation Theorem</a>, KANs rest on the theoretical grounds of Kolmogorv-Arnold (KA) representation <a href="https://ris.utwente.nl/ws/files/256147274/2021_Schmidt_Hieber_Neural_Networks_The_Kolmogorov_Arnold.pdf">theorem</a>. This theorem posits that any <em>multivariate</em> continuous function can be expressed as the finite composition of a <em>single variate </em>continuous function. It follows from this theorem that learning a high-dimensional function (an extremely common and crucial task in machine learning) can boil down to learning a polynomial number of one-dimensional functions.&nbsp;</p><p>The authors begin by using the KA representation theorem to create a single KAN layer abstraction, and then extend it by introducing a depth-wise expansion. They do so via extending the single composition of one-dimensional functions into a matrix of learnable 1D functions. By introducing the depth wise expansion, they are both strengthening the conceptual relationship to MLPs and improving the accuracy of the functional representation learned by KANs..</p><p>The authors propose training a very large and deep representation with introduced sparsity regularization. The authors show that one can prune away all of the unnecessary nodes resulting in better performance and easier interpretability. The final innovation which assists in interpretability is what the authors call Symbolification. The authors suggest that some activations <em>are</em> symbolic functions (like log or sin). They show that these symbolic functions can be determined through learnable affine transformations from the pre-activations to the post-activations. Through node pruning and Symboliification, they empirically demonstrated learning symbolic representations for numerous spline activations and configurations.&nbsp;</p><p>These concepts can be best demonstrated in this beautiful <a href="https://twitter.com/i/status/1785483967719981538">animation</a> from the authors, which can be seen below. As the number of learning iterations increases, we can see both the pruning and the symbolic expression in action. By Step 50, we can see the topmost node has learned to represent the exponential function, while the sin function is learned by both nodes in the middle layer. Finally, in the bottom layer, we see that for all four input variables, the splines learn a squared representation while the extraneous inputs are pruned away..</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oqpK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b3eb80b-7279-4203-bf0f-5153dabf4d9e_600x802.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oqpK!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b3eb80b-7279-4203-bf0f-5153dabf4d9e_600x802.gif 424w, https://substackcdn.com/image/fetch/$s_!oqpK!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b3eb80b-7279-4203-bf0f-5153dabf4d9e_600x802.gif 848w, https://substackcdn.com/image/fetch/$s_!oqpK!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b3eb80b-7279-4203-bf0f-5153dabf4d9e_600x802.gif 1272w, https://substackcdn.com/image/fetch/$s_!oqpK!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b3eb80b-7279-4203-bf0f-5153dabf4d9e_600x802.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oqpK!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b3eb80b-7279-4203-bf0f-5153dabf4d9e_600x802.gif" width="392" height="523.9733333333334" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b3eb80b-7279-4203-bf0f-5153dabf4d9e_600x802.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:802,&quot;width&quot;:600,&quot;resizeWidth&quot;:392,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oqpK!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b3eb80b-7279-4203-bf0f-5153dabf4d9e_600x802.gif 424w, https://substackcdn.com/image/fetch/$s_!oqpK!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b3eb80b-7279-4203-bf0f-5153dabf4d9e_600x802.gif 848w, https://substackcdn.com/image/fetch/$s_!oqpK!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b3eb80b-7279-4203-bf0f-5153dabf4d9e_600x802.gif 1272w, https://substackcdn.com/image/fetch/$s_!oqpK!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b3eb80b-7279-4203-bf0f-5153dabf4d9e_600x802.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Our Take</strong></h4><p>One of the most exciting things to me about this work is the speed in which the machine learning community seems to be adopting , contributing , and expanding the domain of knowledge around KANs. Some notable community highlights include implementations trying to extend the methodology to different deep learning frameworks (Pytorch) and other domains (reinforcement learning). Some of those can be seen <a href="https://github.com/Blealtan/efficient-kan">here</a> and <a href="https://github.com/riiswa/kanrl">here</a>. Additionally the physicist in me loves to see both the symbolic representation regarding interpretability and that the learned representations seem to be respecting the laws of nature. I am eagerly awaiting to see what comes next regarding innovations and improvements as well as how scientists can adopt KANs for new and exciting use cases as highlighted by some of the authors from the website formerly known as Twitter&nbsp; <a href="https://x.com/zimingliu11/status/1785659007543459985?s=46">here</a> and <a href="https://twitter.com/zimingliu11/status/1786427503801774291?t=4H0OD5wY7jXVt01KMRcMVg">here</a>.&nbsp;</p><p>-Justin</p><p>I like this paper and am excited at what seems to be a new idea, but the jury&#8217;s far from out yet on KANs. See this <a href="https://github.com/SHI-Labs/CompactNet">CompactNet repo</a>: the KAN paper&#8217;s comparisons to the noted DeepMind paper had some issues, and the authors found they could match KAN&#8217;s accuracy on the mathematics dataset using networks with as few as 122 parameters. This makes me a tad skeptical of the other KAN/MLP comparisons; did the authors train the MLPs as well as they could have? Once again: baselines are important! I also want to mention this <a href="https://x.com/bozavlado/status/1787376558484709691">thread and notebook</a>, which shows you can rewrite a KAN into an ordinary MLP.&nbsp;It&#8217;s worth noting the argument starts by simplifying its focus to piecewise linear functions; it&#8217;s not entirely obvious to me if the argument extends in general, so I won&#8217;t comment on the veracity of the claim, but it&#8217;s worth looking at. </p><p>&#8212;Daniel</p><h2>New from the Gradient</h2><h3><a href="https://thegradientpub.substack.com/p/david-thorstad-bounded-rationality-longtermism">David Thorstad: Bounded Rationality and the Case Against Longtermism</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!plJV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c3ca20-3f9b-489a-8066-faa8c79086f5_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!plJV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c3ca20-3f9b-489a-8066-faa8c79086f5_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!plJV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c3ca20-3f9b-489a-8066-faa8c79086f5_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!plJV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c3ca20-3f9b-489a-8066-faa8c79086f5_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!plJV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c3ca20-3f9b-489a-8066-faa8c79086f5_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!plJV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c3ca20-3f9b-489a-8066-faa8c79086f5_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52c3ca20-3f9b-489a-8066-faa8c79086f5_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!plJV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c3ca20-3f9b-489a-8066-faa8c79086f5_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!plJV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c3ca20-3f9b-489a-8066-faa8c79086f5_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!plJV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c3ca20-3f9b-489a-8066-faa8c79086f5_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!plJV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52c3ca20-3f9b-489a-8066-faa8c79086f5_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/david-thorstad-bounded-rationality-longtermism&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/david-thorstad-bounded-rationality-longtermism"><span>Listen</span></a></p><h3><a href="https://thegradientpub.substack.com/p/ryan-tibshirani-regression-conformal-prediction">Ryan Tibshirani: Statistics, Nonparametric Regression, Conformal Prediction</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SXq-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e58685-3d3b-46cc-8926-669bcf3c27bc_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SXq-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e58685-3d3b-46cc-8926-669bcf3c27bc_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!SXq-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e58685-3d3b-46cc-8926-669bcf3c27bc_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!SXq-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e58685-3d3b-46cc-8926-669bcf3c27bc_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!SXq-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e58685-3d3b-46cc-8926-669bcf3c27bc_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SXq-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e58685-3d3b-46cc-8926-669bcf3c27bc_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29e58685-3d3b-46cc-8926-669bcf3c27bc_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SXq-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e58685-3d3b-46cc-8926-669bcf3c27bc_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!SXq-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e58685-3d3b-46cc-8926-669bcf3c27bc_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!SXq-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e58685-3d3b-46cc-8926-669bcf3c27bc_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!SXq-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e58685-3d3b-46cc-8926-669bcf3c27bc_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/ryan-tibshirani-regression-conformal-prediction&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/ryan-tibshirani-regression-conformal-prediction"><span>Listen</span></a></p><h2>Other Things That Caught Our Eyes</h2><h3>News</h3><p><strong><a href="https://www.nytimes.com/2024/04/22/technology/generative-ai-gene-editing-crispr.html">Generative A.I. Arrives in the Gene Editing World of CRISPR</a></strong></p><p>A new AI technology developed by Profluent, a startup based in Berkeley, California, is generating blueprints for microscopic biological mechanisms that can edit DNA&#8212;Profluent's AI technology analyzes large amounts of biological data to create new gene editors. This advancement in AI-driven gene editing could lead to more precise and faster methods for battling illnesses and diseases. The research paper describing this technology will be presented at the annual meeting of the American Society of Gene and Cell Therapy.&nbsp;</p><p><strong><a href="https://www.nytimes.com/2024/04/24/technology/meta-profit-stock-ai.html">Meta Says It Plans to Spend Billions More on A.I.</a></strong></p><p>Meta, the parent company of Facebook, Instagram, WhatsApp, and Messenger, reported strong revenue and profits for the first quarter of the year. Revenue was $36.5 billion, up 27% from the previous year, and profit was $12.4 billion, more than double the previous year. However, Meta plans to increase its spending on AI efforts, with a projected spending forecast of $35 billion to $40 billion for the year. This increase is driven by investments in AI infrastructure, including data centers, chip designs, and research and development. Despite the positive financial results, Meta's revenue forecast for the current quarter is lower than analysts' expectations. </p><p><strong><a href="https://www.thebaltimorebanner.com/education/k-12-schools/eric-eiswert-ai-audio-baltimore-county-YBJNJAS6OZEE5OQVF5LFOFYN6M/">Ex-athletic director accused of framing principal with AI arrested at airport with gun</a></strong></p><p>A former athletic director at Pikesville High School in Baltimore County has been arrested and charged with crimes related to the use of AI to impersonate the school's principal and spread racist and antisemitic comments. Dazhon Darien, 31, was apprehended at the airport with a gun as he attempted to board a flight. Investigators determined that Darien faked the principal's voice using AI and circulated the audio on social media, causing significant disruptions at the school. The audio clip led to the temporary removal of the principal and triggered a wave of hate-filled messages. Darien is also charged with theft and retaliating against a witness. </p><p><strong><a href="https://www.scientificamerican.com/article/lethal-ai-weapons-are-on-the-rise-whats-next/">Lethal AI Weapons Are on the Rise. What&#8217;s Next?</a></strong></p><p>The development of lethal autonomous weapons (LAWs), including AI-equipped drones, is on the rise. These weapons have the ability to find and kill targets without human intervention. The United Nations has taken a step towards addressing this issue by adding LAWs to the agenda of the UN General Assembly meeting in September. However, progress has been slow due to a lack of consensus on what constitutes an autonomous weapon. AI weapons offer advantages such as increased accuracy and the ability to operate in environments with electronic jamming. However, there are concerns about the potential for catastrophic mistakes and the ethical implications of delegating life-and-death decisions to machines. </p><p><strong><a href="https://gizmodo.com/ai-can-tell-your-political-affiliation-just-by-looking-1851430714">AI Can Tell Your Political Affiliation Just by Looking at Your Face, Researchers Find</a></strong></p><p>A recent study conducted by researchers at Stanford University suggests that facial recognition technology combined with artificial intelligence can accurately determine a person's political affiliation by analyzing their facial features. The study involved 591 participants who completed a questionnaire about their political beliefs and were then scanned by an AI algorithm. The algorithm was able to predict political orientation with a high degree of accuracy, even when participants' identities were anonymized. The researchers found that liberals and conservatives have distinct facial morphologies, with liberals having smaller faces. The study highlights the potential implications of biometric surveillance technologies and the need for caution in targeting political messaging online. </p><p><strong><a href="https://www.washingtonpost.com/technology/2024/04/30/chicago-tribune-open-ai-microsoft-lawsuit/">8 major newspapers join legal backlash against OpenAI, Microsoft</a></strong></p><p>Eight major daily newspapers, including the Chicago Tribune and the New York Daily News, have filed a lawsuit against OpenAI and Microsoft. The lawsuit claims that the companies used copyrighted work from the newspapers to train their artificial intelligence algorithms without compensating the content owners. The newspapers involved in the lawsuit are owned by Alden Global Capital, an investment fund based in New York City. The lawsuit specifically mentions OpenAI's ChatGPT as one of the AI tools that allegedly used the news articles for training. </p><p><strong><a href="https://gothamist.com/news/mta-banned-from-using-facial-recognition-to-enforce-fare-evasion">MTA banned from using facial recognition to enforce fare evasion</a></strong></p><p>The new state budget in New York includes a ban on the use of facial recognition technology by the Metropolitan Transportation Authority (MTA) to enforce fare evasion rules. The law prohibits the MTA from using biometric identifying technology, including facial recognition, to enforce fare payment. The measure was added to protect New Yorkers' privacy and prevent the potential invasion of people's lives through expanded surveillance. Privacy advocates and good government groups have praised the ban, particularly as the state Legislature increased the maximum penalty for fare evasion. However, some civil rights groups argue that the ban does not go far enough and are calling for legislation to fully outlaw the use of facial recognition by government agencies. Facial recognition technology has been criticized for its imperfections and potential for biased results. The ban is seen as a signal of growing skepticism among New York legislators regarding law enforcement's use of facial recognition technology. </p><p><strong><a href="https://www.theverge.com/2024/5/1/24146053/google-ai-talent-immigration-schedule-a">Google urges US to update immigration rules to attract more AI talent</a></strong></p><p>Google is urging the US government to update its immigration rules to attract more AI and tech talent. The company believes that current policies, such as Schedule A, which lists occupations with a shortage of American workers, need to be more flexible and regularly updated to meet the demand in AI and cybersecurity. Google argues that the US risks losing out on highly sought-after talent if immigration policies are not modernized. The company suggests including AI and cybersecurity on Schedule A and considering multiple data sources to reflect workforce gaps. The US's strict immigration policies have made it difficult for companies to attract AI specialists, resulting in a shortage of talent in the country. </p><p><strong><a href="https://www.scmp.com/tech/tech-trends/article/3260790/china-unveils-sora-challenger-able-produce-videos-text-similar-openai-tool-though-much-shorter">China unveils Sora challenger able to produce videos from text</a></strong></p><p>China has unveiled its own text-to-video AI tool called Vidu, which is similar to OpenAI's Sora. Developed by start-up Shengshu Technology in collaboration with Tsinghua University, Vidu can produce 1080p resolution videos based on simple text prompts, although the videos are limited to 16 seconds compared to Sora's 60 seconds. Vidu is described as "imaginative" and able to simulate the physical world, producing videos with consistent characters, scenes, and timelines. The lack of sufficient computing power has been a hindrance to Chinese firms in developing similar AI models, as Sora requires eight Nvidia A100 GPUs to run for over three hours to produce a one-minute clip. Vidu's debut has raised hopes in China as the country aims to catch up with global generative AI players.</p><p><strong><a href="https://techcrunch.com/2024/04/29/nist-launches-a-new-platform-to-assess-generative-ai/">NIST launches a new platform to assess generative AI</a></strong></p><p>The National Institute of Standards and Technology (NIST) has launched a new program called NIST GenAI to assess generative AI technologies, including text- and image-generating AI. The program aims to release benchmarks, develop content authenticity detection systems, and encourage the creation of software to identify the source of fake or misleading AI-generated information. NIST GenAI's first project is a pilot study to differentiate between human-created and AI-generated media, starting with text. Teams from academia, industry, and research labs are invited to submit AI systems to generate content or identify AI-generated content. The launch of NIST GenAI is in response to President Joe Biden's executive order on AI and aims to address the growing concern of AI-generated misinformation and deepfakes. The program will inform the work of NIST's AI Safety Institute.</p><p><strong><a href="https://www.businessinsider.com/satya-nadella-bill-gates-microsoft-concern-google-rivals-ai-emails-2024-5">Read the email to Satya Nadella and Bill Gates that shows Microsoft's CTO was 'very worried' about Google's AI progress in 2019</a></strong></p><p>In 2019, Microsoft's CTO, Kevin Scott, expressed concern about Google's advancements in artificial intelligence (AI) in an email to CEO Satya Nadella and Bill Gates. Scott specifically mentioned Google's AI-powered "auto-complete in Gmail" as being "scarily good." He also acknowledged that Microsoft was behind in terms of machine learning (ML) scale. These emails were made public as part of the Department of Justice's antitrust case against Google. In response, Nadella highlighted the need for Microsoft to invest in AI, which led to their partnership with OpenAI. Microsoft's timely investment allowed them to incorporate OpenAI's technology into products like Bing and Microsoft 365, potentially surpassing Google in the AI race. </p><p><strong><a href="https://www.foxbusiness.com/technology/ads-facebook-instagram-ai-girlfriends-meta-crackdown">Ads on Facebook, Instagram for explicit 'AI girlfriends' prompt Meta crackdown</a></strong></p><p>An investigation by Wired revealed that there were over 29,000 explicit advertisements for "AI girlfriend" apps on Meta's platforms, including Facebook, Instagram, and Messenger. These ads featured sexually explicit messaging and AI-generated images of scantily-clad women. More than half of the ads included the acronym "NSFW," indicating that they were not safe for work. Meta, the parent company of these platforms, has policies in place that prohibit adult content, and a spokesperson stated that they are working to remove the violating ads. Meta acknowledges that they are constantly evaluating and updating their approach to address new tactics used by individuals or groups to evade detection. </p><p><strong><a href="https://www.nytimes.com/2024/04/29/technology/ai-google-microsoft.html">Friends From the Old Neighborhood Turn Rivals in Big Tech&#8217;s A.I. Race</a></strong></p><p>Mustafa Suleyman and Demis Hassabis, childhood friends from London, have become powerful executives in the tech industry's race to build AI. Suleyman is the chief executive of Microsoft AI, while Hassabis is the chief executive of Google DeepMind. In 2010, they co-founded DeepMind, an AI research lab aimed at preventing the profit-driven race to build and deploy AI that they are now both involved in. </p><p><strong><a href="https://arstechnica.com/information-technology/2024/04/rumors-swirl-about-mystery-gpt2-chatbot-that-some-think-is-gpt-5-in-disguise/">Mysterious &#8220;gpt2-chatbot&#8221; AI model appears suddenly, confuses experts</a></strong></p><p>A mysterious chatbot named "gpt2-chatbot" has appeared in the LMSYS Chatbot Arena, sparking speculation that it may be a secret test version of OpenAI's upcoming GPT-4.5 or GPT-5 language model. The new model is currently only available through the Chatbot Arena website, with a limited rate of eight queries per day. Despite rumors and hype surrounding the model's capabilities, some users have found that it does not represent a significant leap beyond GPT-4 Turbo. While the origins of the model remain unknown, AI researcher Simon Willison believes it may be an OpenAI stealth preview. </p><h3>Papers</h3><p><strong>Daniel</strong>:<strong> </strong>Our own <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Hugh Zhang&quot;,&quot;id&quot;:3550791,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/2d3a4ac7-4ebd-481f-968c-1e63ff773c6f_636x542.jpeg&quot;,&quot;uuid&quot;:&quot;0effd119-a308-4b42-8e86-17a3e68168bd&quot;}" data-component-name="MentionToDOM"></span>&#8212;who is now at Scale AI&#8212;led an incredible <a href="https://t.co/XFPOF35l5X">paper</a> trying to understand whether data contamination might be going on in notable LLMs. Thy evaluated LLMs on a new test set of the grade school arithmetic benchmark GSM8K, called GSM1K, and observed interesting accuracy discrepancies that pointed to systematic overfitting in models like Phi and Mistral. On a similar theme, I like <a href="https://arxiv.org/abs/2404.15146">this paper</a> that presents a new metric for assessing memorization in LLMS: the Adversarial Compression Ratio (ACR). A string from the training data is considered &#8220;memorized&#8221; if it can be elicited with a prompt shorter than the original string.&nbsp;</p><p>On another topics, <a href="https://arxiv.org/abs/2404.13292">this paper</a> proposes an evaluation framework for subword tokenization. This is pretty important, given that tokenizer evals and comparison are still open problems. The authors&#8217; UniMorph Labeller classifies a subword tokenization as either &#8220;morphological&#8221; (subword tokenizations that &#8220;respect morpheme boundaries,&#8221; or: are not present in the vocabulary and aren&#8217;t alien) or &#8220;alien&#8221; (linguistically implausible subword compositions, e.g. those that do not align with semantic compositions that humans understand). They also propose the Out-of-Vocabulary Generalization challenge for evaluating sub-word tokenization in downstream NLP tasks&#8212;they find alien compositions lead to poor tokenization compared to morphological ones.&nbsp;</p><p>You should also check out this <a href="https://arxiv.org/abs/2402.03175">Bayesian learning model for LLMs</a>&#8212;I think it&#8217;s a nice framework to think about LLMs, and, if you&#8217;ve been following work on in-context learning, the paper notes implications for ICL as well. Maybe it&#8217;s notable the a Bayesian picture, at least for ICL, still seems to explain the evidence we have now.</p><p>Finally, I think <a href="https://arxiv.org/abs/2404.14994">Transformers Can Represent </a><em><a href="https://arxiv.org/abs/2404.14994">n</a></em><a href="https://arxiv.org/abs/2404.14994">-gram Language Models</a> is a really neat paper. The authors argue that language acceptance is an ill-suited problem for studying LMs, and that transformer LMs using hard or sparse attention can exactly represent any <em>n</em>-gram LM (with stronger results for hard attention). The concrete lower bound on LMs&#8217; representational capacity is useful, and helps contextualize and justify explanations and intuitions we might have about what&#8217;s causing LMs to behave as they do.&nbsp;</p><h3>Closing Thoughts</h3><p>Have something to say about this edition&#8217;s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at <a href="mailto:dbashir@hmc.edu">dbashir@hmc.edu</a> or on <a href="https://twitter.com/spaniel_bashir">Twitter</a>. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this grad-student / volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!</p>]]></content:encoded></item><item><title><![CDATA[Mini-Update #40: NIST Launches GenAI and Lifelike Facial Mimicking]]></title><description><![CDATA[The National Institute for Standards and Technology begins evaluations on generative AI and Microsoft launches VASA-1, which produces realistic talking faces.]]></description><link>https://thegradientpub.substack.com/p/mini-update-40-nist-launches-genai</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/mini-update-40-nist-launches-genai</guid><dc:creator><![CDATA[Ather Fawaz]]></dc:creator><pubDate>Wed, 01 May 2024 00:01:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IPvr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df9166b-acec-4808-adfc-61075ddf7279_730x482.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 40th Mini-Update from the Gradient! This is our exclusive newsletter edition specifically for paying subscribers and is our way to show you our appreciation for your support.</p>
      <p>
          <a href="https://thegradientpub.substack.com/p/mini-update-40-nist-launches-genai">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Update #73: Against Language Erasure and Better Long-Context Benchmarking]]></title><description><![CDATA[Researchers and governments develop data sets and technology for local languages, and NVIDIA presents a new, more thorough benchmark for long-context models.]]></description><link>https://thegradientpub.substack.com/p/update-73-language-erasure-long-context</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/update-73-language-erasure-long-context</guid><dc:creator><![CDATA[Cole Frank]]></dc:creator><pubDate>Tue, 23 Apr 2024 15:30:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!r3y6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 73rd update from the Gradient! If you&#8217;re new and like what you see, <a href="https://thegradientpub.substack.com/">subscribe</a> and follow us on <a href="https://twitter.com/gradientpub">Twitter</a>. <strong>Our newsletters run long, so you&#8217;ll need to view this post on Substack to see everything!</strong></p><h2>Editor Notes</h2><ul><li><p>I&#8217;m very excited to welcome two new editors to our team: <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jaymee Sheng&quot;,&quot;id&quot;:25461432,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/019c59ff-376e-4cbe-a48c-c5204dddc85d_144x144.png&quot;,&quot;uuid&quot;:&quot;a65f934b-cf95-484b-9435-bfe0f705e545&quot;}" data-component-name="MentionToDOM"></span> and <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Cole Frank&quot;,&quot;id&quot;:697316,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d4214e-20d3-4731-a2d8-0a0abbc8c105_417x560.jpeg&quot;,&quot;uuid&quot;:&quot;ae4bc158-f062-4aed-9470-93d24f41a8cf&quot;}" data-component-name="MentionToDOM"></span>. Jaymee was previously editor-in-chief for an interdisciplinary academic journal, published some very interesting computational social science research, and worked as an ML engineer. Cole worked in macroeconomic analysis for 4 years before getting interested in AI, and now does ML research and AI curriculum development at CMU&#8217;s Software Engineering Institute. They&#8217;ll both be part of the editorial staff and help out with the newsletter. </p></li><li><p>As you can see, we&#8217;re doing occasional advertisements in our newsletter&#8212;we&#8217;re open to considering more.&nbsp;</p></li><li><p>Want to write with us? Send a pitch using <a href="https://goo.gl/forms/whYRKEzMZJox6FaH2">this form</a>.</p></li></ul><div><hr></div><p><strong>Can't afford to test in production? Come to The world&#8217;s first AI Quality conference</strong></p><p>On June 25th in San Francisco, <a href="https://www.aiqualityconference.com/">join the first ever AI Qualify conference</a>. Join 1000+ other highly informed attendees to connect and learn about how you can keep grom letting your AI efforts go awry.</p><p>Hear speakers from Uber, Groq, Cruise, Torc Robotics, Notion, Anthropic, Open AI, Google and 20+ more organizations.</p><p>Use the special discount code 'testinprod' for 20% off the ticket price.</p><div><hr></div><h2><strong>News Highlight</strong>: Researchers and Governments Push Back Against Generative AI's Erasure of Low-Resource Languages</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r3y6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r3y6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png 424w, https://substackcdn.com/image/fetch/$s_!r3y6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png 848w, https://substackcdn.com/image/fetch/$s_!r3y6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png 1272w, https://substackcdn.com/image/fetch/$s_!r3y6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r3y6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png" width="1456" height="1037" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1037,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r3y6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png 424w, https://substackcdn.com/image/fetch/$s_!r3y6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png 848w, https://substackcdn.com/image/fetch/$s_!r3y6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png 1272w, https://substackcdn.com/image/fetch/$s_!r3y6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb257f-7f85-4973-9564-b044dc9bf6d0_1496x1066.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://www.statista.com/chart/14900/two-worlds_-languages-irl-and-online/">statista</a></figcaption></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>It is no secret that the internet is dominated by English content and AI models tend to perform better on English than other lower-resource languages, if they are supported at all. Recently AI chatbots have been found to mischaracterize an African language spoken by over 2 million people as a fictional language and to provide incorrect dates and information about how to cast a ballot for the upcoming EU election. The risk of generative AI further marginalizing non-English languages and cultures has prompted researchers and governments alike to develop data sets and technology for local languages in an effort to preserve them and better serve their own populations.</p><h4><strong>Overview</strong></h4><p><a href="https://restofworld.org/2023/internet-most-used-languages/">More than half</a> of all the websites on the internet use English as their primary language, even though over 80 percent of people in the world don't speak it. Google Translate supports 133 of some 7,000 languages spoken in the world; most of the highest-performing language models serve only 8 to 10 languages. As reported by <em>The Atlantic</em>, a popular AI model recently described Fon, a language spoken by 2.3 million people in Benin and neighboring countries, as <a href="https://www.theatlantic.com/technology/archive/2024/04/generative-ai-low-resource-languages/678042/">"a fictional language."</a> As chatbots and generative AI continue to influence how people navigate the web and interact with the world, those who do not have command of a high-resource language will not be able to rely on AI to draft work memos, conduct research, tutor their child, search for information, or perform other tasks. Due to <a href="https://arxiv.org/pdf/2401.05749.pdf">poor quality</a> of existing online content in low-resource languages, models trained on machine translated texts could also expose speakers of those languages to misinformation. Moreover, models tend to lack awareness of cultural context and nuance. For example, a researcher at AI Singapore found that models trained with translated texts in several Southeast Asian languages know "much more about hamburgers and Big Ben than local cuisines and landmarks."</p><p>To save hundreds of African languages from being washed out by AI, researchers like Ife Adebara who works with <a href="https://www.masakhane.io/home">Masakhane</a>, a grassroots NLP community, are racing to collect data and create software for languages that are poorly represented on the web. It took Adebara years to curate a 42-gigabyte training data set for 517 African languages&#8211;the largest and most comprehensive to date, yet it is only 0.4% of the size of the largest publicly available English training data set.&nbsp;</p><p>In Europe, a <a href="https://www.politico.eu/article/ai-chatbots-spread-falsehoods-about-the-eu-elections-report-finds/">research study</a> conducted in March by Berlin-based NGO Democracy Reporting International found that chatbots by Google, Microsoft, and OpenAI tended to return incorrect election dates and information about how to cast a ballot ahead of the European election. To eliminate potential election risks like this connected to generative AI tools and counteract U.S. cultural dominance, EU countries have been pushing for development of local models that are truly fluent in local languages. According to data from <a href="https://www.politico.eu/article/europeans-race-create-artificial-intelligence-chatbots-counter-english-ai/">Politico</a>, 9 countries have already released large language models focused on their local languages, while 7 others are developing them. Most of these projects are open-source. The fight with tech companies over high-quality non-English training content such as media archives and news outlets has also led France's Finance Minister Bruno Le Maire to propose the creation of a price-controlled European single market for training data, in order to prevent U.S. tech giants from outbidding European AI companies for access.</p><h4><strong>Our Take</strong></h4><p>People want to be reflected in the technology they use. And when they don't see their language and culture represented in online media or taken into consideration by transformative technologies like AI, it's easy to feel that their language is not important and that adapting to higher-resource languages like English is a must. Most of us don't take proclamations by companies like OpenAI that they want to benefit "<a href="https://openai.com/blog/introducing-openai-japan">all of humanity</a>" at face value, and it's worthwhile for all of us to consider who are enjoying the benefits at whose expense and who are getting left behind. Grassroots efforts made by collectives like Masakhane seem an effective and necessary antidote to the relentless takeover of English-centric AI in our world, and I particularly appreciate their focus on understanding what people actually need (not everyone needs or wants a powerful general-purpose chatbot) and building the technology accordingly. Collecting quality data for lower-resource languages is extremely time-consuming and a labor of love, I think, because who will save our languages if not ourselves?</p><p>- Jaymee</p><h2><strong>Research Highlight: </strong>&#8220;<a href="https://arxiv.org/abs/2404.06654">RULER: What&#8217;s the Real Context Size of Your Long-Context Language Models?</a>&#8221;&nbsp;</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VT5S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a8e03a5-4925-468a-88ed-4ec7e0910351_1600x827.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VT5S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a8e03a5-4925-468a-88ed-4ec7e0910351_1600x827.png 424w, https://substackcdn.com/image/fetch/$s_!VT5S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a8e03a5-4925-468a-88ed-4ec7e0910351_1600x827.png 848w, https://substackcdn.com/image/fetch/$s_!VT5S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a8e03a5-4925-468a-88ed-4ec7e0910351_1600x827.png 1272w, https://substackcdn.com/image/fetch/$s_!VT5S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a8e03a5-4925-468a-88ed-4ec7e0910351_1600x827.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VT5S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a8e03a5-4925-468a-88ed-4ec7e0910351_1600x827.png" width="1456" height="753" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a8e03a5-4925-468a-88ed-4ec7e0910351_1600x827.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:753,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VT5S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a8e03a5-4925-468a-88ed-4ec7e0910351_1600x827.png 424w, https://substackcdn.com/image/fetch/$s_!VT5S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a8e03a5-4925-468a-88ed-4ec7e0910351_1600x827.png 848w, https://substackcdn.com/image/fetch/$s_!VT5S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a8e03a5-4925-468a-88ed-4ec7e0910351_1600x827.png 1272w, https://substackcdn.com/image/fetch/$s_!VT5S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a8e03a5-4925-468a-88ed-4ec7e0910351_1600x827.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Needle in a haystack evaluation of Claude 3. Time for a new benchmark. (<a href="https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf">source</a>)</figcaption></figure></div><h4><strong>Summary</strong>&nbsp;</h4><p>Researchers at NVIDIA released a paper detailing a new benchmark for long-context language models called RULER (not an acronym!). The authors motivate the construction of their benchmark by pointing out the insufficiency of some of the most commonly used long-context evaluations, which are already becoming saturated (i.e. most SoTA models achieve scores close to the upper limit of what the benchmark can test) and only test models on simple retrieval from context. In contrast, RULER contains four task categories&#8211;retrieval, multi-hop tracing, aggregation, and question answering&#8211;that are designed to probe for a more comprehensive form of natural language understanding. The authors evaluate GPT-4 and nine open-source, long-context LMs on their new benchmark. Despite all of these models achieving near-perfect results on the standard needle-in-a-haystack (NIAH) test, they find significant performance drops for all of the models as context size increases.</p><h4><strong>Overview</strong>&nbsp;</h4><p>In the past year there has been a flurry of research into enabling language models to handle longer context windows more efficiently&#8212;both via modifications to the transformer architecture (<a href="https://arxiv.org/abs/2205.14135">flash attention</a>, <a href="https://arxiv.org/abs/2310.01889">ring attention</a>, <a href="https://dl.acm.org/doi/fullHtml/10.1145/3530811">sparse attention</a>, length extrapolation via novel embedding methods like <a href="https://arxiv.org/abs/2108.12409">ALiBi</a> and <a href="https://arxiv.org/abs/2104.09864">RoPE</a>,&nbsp; clever methods to reduce the <em>size</em> of context for a given length, and many others.), and with entirely new architectures (<a href="https://thegradientpub.substack.com/p/mamba-explained">Mamba</a>, <a href="https://arxiv.org/abs/2305.13048">RWKV</a>, etc.). Just last week, Google <a href="https://arxiv.org/abs/2404.07143">introduced</a> a new attention mechanism they&#8217;re calling &#8220;Infini-attention&#8221; for handling infinitely long inputs with bounded memory and computation.</p><p>But evaluations of these longer and longer context windows have not kept apace. The most commonly used long context evaluations&#8212;passkey retrieval and NIAH tests&#8212;entail prompting a model with a {<em>key</em>}:{<em>value</em>} pair embedded in some distractor text (Paul Graham essays are a popular choice) along with a <em>query</em> to retrieve the value that corresponds to the provided {<em>key</em>}. Aside from being <a href="https://twitter.com/savvyRL/status/1764737721753571381">saturated</a>, these evaluations only test a very narrow and superficial form of natural language understanding. And yet they remain the standard for demonstrating long-context capabilities. For example, Google&#8217;s &#8220;Infini-attention&#8221; paper only reports results on two benchmarks: a passkey retrieval task and a ROUGE-based book summarization evaluation).</p><p>To improve on NIAH, Hsieh et al. devise a test suite of thirteen tasks divided across four task categories:</p><ol><li><p><strong>Retrieval: </strong>These tasks expand on the standard NIAH test with more complex multi-key, multi-value, and multi-query variations:</p></li></ol><blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hmhq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5127c165-3de3-4a8f-8563-32aed52c4f59_1064x154.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hmhq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5127c165-3de3-4a8f-8563-32aed52c4f59_1064x154.png 424w, https://substackcdn.com/image/fetch/$s_!Hmhq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5127c165-3de3-4a8f-8563-32aed52c4f59_1064x154.png 848w, https://substackcdn.com/image/fetch/$s_!Hmhq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5127c165-3de3-4a8f-8563-32aed52c4f59_1064x154.png 1272w, https://substackcdn.com/image/fetch/$s_!Hmhq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5127c165-3de3-4a8f-8563-32aed52c4f59_1064x154.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hmhq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5127c165-3de3-4a8f-8563-32aed52c4f59_1064x154.png" width="1064" height="154" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5127c165-3de3-4a8f-8563-32aed52c4f59_1064x154.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:154,&quot;width&quot;:1064,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hmhq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5127c165-3de3-4a8f-8563-32aed52c4f59_1064x154.png 424w, https://substackcdn.com/image/fetch/$s_!Hmhq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5127c165-3de3-4a8f-8563-32aed52c4f59_1064x154.png 848w, https://substackcdn.com/image/fetch/$s_!Hmhq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5127c165-3de3-4a8f-8563-32aed52c4f59_1064x154.png 1272w, https://substackcdn.com/image/fetch/$s_!Hmhq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5127c165-3de3-4a8f-8563-32aed52c4f59_1064x154.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></blockquote><ol start="2"><li><p><strong>Multi-hop Tracing: </strong>A novel task called <em>variable tracking </em>(VT) that tests the model&#8217;s ability to trace variable assignments within the context:</p></li></ol><blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ykGi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe167870f-8d17-4c92-9cc4-db2ab4253c30_876x134.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ykGi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe167870f-8d17-4c92-9cc4-db2ab4253c30_876x134.png 424w, https://substackcdn.com/image/fetch/$s_!ykGi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe167870f-8d17-4c92-9cc4-db2ab4253c30_876x134.png 848w, https://substackcdn.com/image/fetch/$s_!ykGi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe167870f-8d17-4c92-9cc4-db2ab4253c30_876x134.png 1272w, https://substackcdn.com/image/fetch/$s_!ykGi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe167870f-8d17-4c92-9cc4-db2ab4253c30_876x134.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ykGi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe167870f-8d17-4c92-9cc4-db2ab4253c30_876x134.png" width="876" height="134" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e167870f-8d17-4c92-9cc4-db2ab4253c30_876x134.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:134,&quot;width&quot;:876,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ykGi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe167870f-8d17-4c92-9cc4-db2ab4253c30_876x134.png 424w, https://substackcdn.com/image/fetch/$s_!ykGi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe167870f-8d17-4c92-9cc4-db2ab4253c30_876x134.png 848w, https://substackcdn.com/image/fetch/$s_!ykGi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe167870f-8d17-4c92-9cc4-db2ab4253c30_876x134.png 1272w, https://substackcdn.com/image/fetch/$s_!ykGi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe167870f-8d17-4c92-9cc4-db2ab4253c30_876x134.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></blockquote><ol start="3"><li><p><strong>Aggregation: </strong>Two novel tasks, common word extraction (CWE) and frequent word extraction (FWE), that test the model&#8217;s ability to aggregate information by asking which words appear most frequently in the context:</p></li></ol><blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pVdf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef88cdeb-e3a6-40e9-bd5a-09dffbbcd544_1064x140.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pVdf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef88cdeb-e3a6-40e9-bd5a-09dffbbcd544_1064x140.png 424w, https://substackcdn.com/image/fetch/$s_!pVdf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef88cdeb-e3a6-40e9-bd5a-09dffbbcd544_1064x140.png 848w, https://substackcdn.com/image/fetch/$s_!pVdf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef88cdeb-e3a6-40e9-bd5a-09dffbbcd544_1064x140.png 1272w, https://substackcdn.com/image/fetch/$s_!pVdf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef88cdeb-e3a6-40e9-bd5a-09dffbbcd544_1064x140.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pVdf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef88cdeb-e3a6-40e9-bd5a-09dffbbcd544_1064x140.png" width="1064" height="140" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef88cdeb-e3a6-40e9-bd5a-09dffbbcd544_1064x140.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:140,&quot;width&quot;:1064,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pVdf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef88cdeb-e3a6-40e9-bd5a-09dffbbcd544_1064x140.png 424w, https://substackcdn.com/image/fetch/$s_!pVdf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef88cdeb-e3a6-40e9-bd5a-09dffbbcd544_1064x140.png 848w, https://substackcdn.com/image/fetch/$s_!pVdf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef88cdeb-e3a6-40e9-bd5a-09dffbbcd544_1064x140.png 1272w, https://substackcdn.com/image/fetch/$s_!pVdf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef88cdeb-e3a6-40e9-bd5a-09dffbbcd544_1064x140.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></blockquote><ol start="4"><li><p><strong>Question Answering: </strong>Augmentations of existing QA datasets with distracting information to evaluate question answering performance at varying context lengths:</p></li></ol><blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P-w3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d731a67-764c-40f9-a403-f5c07304719d_1048x110.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P-w3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d731a67-764c-40f9-a403-f5c07304719d_1048x110.png 424w, https://substackcdn.com/image/fetch/$s_!P-w3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d731a67-764c-40f9-a403-f5c07304719d_1048x110.png 848w, https://substackcdn.com/image/fetch/$s_!P-w3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d731a67-764c-40f9-a403-f5c07304719d_1048x110.png 1272w, https://substackcdn.com/image/fetch/$s_!P-w3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d731a67-764c-40f9-a403-f5c07304719d_1048x110.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P-w3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d731a67-764c-40f9-a403-f5c07304719d_1048x110.png" width="1048" height="110" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d731a67-764c-40f9-a403-f5c07304719d_1048x110.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:110,&quot;width&quot;:1048,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!P-w3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d731a67-764c-40f9-a403-f5c07304719d_1048x110.png 424w, https://substackcdn.com/image/fetch/$s_!P-w3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d731a67-764c-40f9-a403-f5c07304719d_1048x110.png 848w, https://substackcdn.com/image/fetch/$s_!P-w3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d731a67-764c-40f9-a403-f5c07304719d_1048x110.png 1272w, https://substackcdn.com/image/fetch/$s_!P-w3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d731a67-764c-40f9-a403-f5c07304719d_1048x110.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></blockquote><p>All of the tasks are completely synthetic and have flexible configurations (e.g. number of keys or values included for the NIAH test) that allow for adjustable difficulty. Overall performance on RULER is the average of all thirteen tasks.</p><p>The authors evaluate GPT-4 and a number of open-source models on RULER, and show that their benchmark reveals a lot of variation in model performance that a basic NIAH test would have missed. Here is model performance with context windows ranging from 4k to 128k on just such a basic NIAH test:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Mp4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e71acda-df32-4a34-86ca-a3cd5528a413_1076x402.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Mp4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e71acda-df32-4a34-86ca-a3cd5528a413_1076x402.png 424w, https://substackcdn.com/image/fetch/$s_!6Mp4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e71acda-df32-4a34-86ca-a3cd5528a413_1076x402.png 848w, https://substackcdn.com/image/fetch/$s_!6Mp4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e71acda-df32-4a34-86ca-a3cd5528a413_1076x402.png 1272w, https://substackcdn.com/image/fetch/$s_!6Mp4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e71acda-df32-4a34-86ca-a3cd5528a413_1076x402.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Mp4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e71acda-df32-4a34-86ca-a3cd5528a413_1076x402.png" width="1076" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e71acda-df32-4a34-86ca-a3cd5528a413_1076x402.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:1076,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Mp4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e71acda-df32-4a34-86ca-a3cd5528a413_1076x402.png 424w, https://substackcdn.com/image/fetch/$s_!6Mp4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e71acda-df32-4a34-86ca-a3cd5528a413_1076x402.png 848w, https://substackcdn.com/image/fetch/$s_!6Mp4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e71acda-df32-4a34-86ca-a3cd5528a413_1076x402.png 1272w, https://substackcdn.com/image/fetch/$s_!6Mp4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e71acda-df32-4a34-86ca-a3cd5528a413_1076x402.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Notice that there is virtually no variation in performance for context windows smaller than 64k tokens. But when they evaluate the same models over the same range of context windows on RULER, significant variations in performance are evident at smaller context lengths:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Cgvf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ca6cdc-5602-47cf-84ed-72c19ad2c4b0_1084x604.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Cgvf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ca6cdc-5602-47cf-84ed-72c19ad2c4b0_1084x604.png 424w, https://substackcdn.com/image/fetch/$s_!Cgvf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ca6cdc-5602-47cf-84ed-72c19ad2c4b0_1084x604.png 848w, https://substackcdn.com/image/fetch/$s_!Cgvf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ca6cdc-5602-47cf-84ed-72c19ad2c4b0_1084x604.png 1272w, https://substackcdn.com/image/fetch/$s_!Cgvf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ca6cdc-5602-47cf-84ed-72c19ad2c4b0_1084x604.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Cgvf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ca6cdc-5602-47cf-84ed-72c19ad2c4b0_1084x604.png" width="1084" height="604" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f5ca6cdc-5602-47cf-84ed-72c19ad2c4b0_1084x604.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:604,&quot;width&quot;:1084,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Cgvf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ca6cdc-5602-47cf-84ed-72c19ad2c4b0_1084x604.png 424w, https://substackcdn.com/image/fetch/$s_!Cgvf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ca6cdc-5602-47cf-84ed-72c19ad2c4b0_1084x604.png 848w, https://substackcdn.com/image/fetch/$s_!Cgvf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ca6cdc-5602-47cf-84ed-72c19ad2c4b0_1084x604.png 1272w, https://substackcdn.com/image/fetch/$s_!Cgvf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ca6cdc-5602-47cf-84ed-72c19ad2c4b0_1084x604.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The authors choose Llama2-7b performance at 4K on RULER (85.6%) as a somewhat arbitrary threshold for determining the &#8220;effective&#8221; context length of each model (i.e. the longest context length at which the model outperforms the threshold). At this threshold, Mixtral (8x7B) is the only model with an effective length greater than or equal to its claimed context length. GPT-4 is the only model that surpasses the threshold at 64k and it also exhibits the least performance degradation between 4k (96.6%) and 128k (81.2%). Unsurprisingly, the three best-performing open-source models (Command-R, Yi, and Mixtral) are also the largest in parameter size.</p><p>To get a better sense of task-level performance, the authors report detailed results of Yi-34B-200k on the individual RULER tasks with a range of configurations. They find that the NIAH variation Yi struggles the most with as context increases is the multi-key (MK) setting:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MM2E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2adb6e-42d2-49e1-854a-05ae2c6610cd_1110x448.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MM2E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2adb6e-42d2-49e1-854a-05ae2c6610cd_1110x448.png 424w, https://substackcdn.com/image/fetch/$s_!MM2E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2adb6e-42d2-49e1-854a-05ae2c6610cd_1110x448.png 848w, https://substackcdn.com/image/fetch/$s_!MM2E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2adb6e-42d2-49e1-854a-05ae2c6610cd_1110x448.png 1272w, https://substackcdn.com/image/fetch/$s_!MM2E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2adb6e-42d2-49e1-854a-05ae2c6610cd_1110x448.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MM2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2adb6e-42d2-49e1-854a-05ae2c6610cd_1110x448.png" width="1110" height="448" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef2adb6e-42d2-49e1-854a-05ae2c6610cd_1110x448.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:448,&quot;width&quot;:1110,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MM2E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2adb6e-42d2-49e1-854a-05ae2c6610cd_1110x448.png 424w, https://substackcdn.com/image/fetch/$s_!MM2E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2adb6e-42d2-49e1-854a-05ae2c6610cd_1110x448.png 848w, https://substackcdn.com/image/fetch/$s_!MM2E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2adb6e-42d2-49e1-854a-05ae2c6610cd_1110x448.png 1272w, https://substackcdn.com/image/fetch/$s_!MM2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2adb6e-42d2-49e1-854a-05ae2c6610cd_1110x448.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>They also find that simply swapping the type of key from a 7-digit number or word to a 32-digit UUID degrades performance (see, for example, the gap between the dark green and light green lines in the second plot from the left above).</p><p>On the non-NIAH tasks, the authors observe performance degradation and a range of failure modes that would have otherwise gone unnoticed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AG5w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9942414-4762-4e32-aa8f-97e9feac2d5a_1102x430.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AG5w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9942414-4762-4e32-aa8f-97e9feac2d5a_1102x430.png 424w, https://substackcdn.com/image/fetch/$s_!AG5w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9942414-4762-4e32-aa8f-97e9feac2d5a_1102x430.png 848w, https://substackcdn.com/image/fetch/$s_!AG5w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9942414-4762-4e32-aa8f-97e9feac2d5a_1102x430.png 1272w, https://substackcdn.com/image/fetch/$s_!AG5w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9942414-4762-4e32-aa8f-97e9feac2d5a_1102x430.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AG5w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9942414-4762-4e32-aa8f-97e9feac2d5a_1102x430.png" width="1102" height="430" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9942414-4762-4e32-aa8f-97e9feac2d5a_1102x430.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:430,&quot;width&quot;:1102,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AG5w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9942414-4762-4e32-aa8f-97e9feac2d5a_1102x430.png 424w, https://substackcdn.com/image/fetch/$s_!AG5w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9942414-4762-4e32-aa8f-97e9feac2d5a_1102x430.png 848w, https://substackcdn.com/image/fetch/$s_!AG5w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9942414-4762-4e32-aa8f-97e9feac2d5a_1102x430.png 1272w, https://substackcdn.com/image/fetch/$s_!AG5w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9942414-4762-4e32-aa8f-97e9feac2d5a_1102x430.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In particular, on the VT and FWE tasks, Yi has a tendency to copy from context as window size increases: &#8220;over 80% of Yi&#8217;s output in the CWE task at 128k is simply a string copied from the one-shot example, whereas the copying is non-existent for short sequences.&#8221; The results above also show that performance drops on more complex configurations of both VT (more variable &#8220;chains&#8221;) and FWE (a lower \alpha indicates a flatter distribution of word frequencies in the context, making the task of aggregating them more difficult). Finally, on the QA task Yi exhibits a tendency to hallucinate as context length increases. This finding prompts the authors to observe that the &#8220;fuzzy matching between a query and a relevant paragraph in long context is a more challenging setting than the simplistic NIAH tests.&#8221; Overall, the results on the non-NIAH tasks underline the importance of testing long-context models on behaviors other than retrieval.</p><p>The paper also includes ablation results for RULER on context length during training, model size, and non-transformer architectures:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eAAD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d810720-c834-4dbf-ac3a-3c8e72bee2d9_1088x474.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eAAD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d810720-c834-4dbf-ac3a-3c8e72bee2d9_1088x474.png 424w, https://substackcdn.com/image/fetch/$s_!eAAD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d810720-c834-4dbf-ac3a-3c8e72bee2d9_1088x474.png 848w, https://substackcdn.com/image/fetch/$s_!eAAD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d810720-c834-4dbf-ac3a-3c8e72bee2d9_1088x474.png 1272w, https://substackcdn.com/image/fetch/$s_!eAAD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d810720-c834-4dbf-ac3a-3c8e72bee2d9_1088x474.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eAAD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d810720-c834-4dbf-ac3a-3c8e72bee2d9_1088x474.png" width="1088" height="474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d810720-c834-4dbf-ac3a-3c8e72bee2d9_1088x474.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:474,&quot;width&quot;:1088,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eAAD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d810720-c834-4dbf-ac3a-3c8e72bee2d9_1088x474.png 424w, https://substackcdn.com/image/fetch/$s_!eAAD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d810720-c834-4dbf-ac3a-3c8e72bee2d9_1088x474.png 848w, https://substackcdn.com/image/fetch/$s_!eAAD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d810720-c834-4dbf-ac3a-3c8e72bee2d9_1088x474.png 1272w, https://substackcdn.com/image/fetch/$s_!eAAD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d810720-c834-4dbf-ac3a-3c8e72bee2d9_1088x474.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The two leftmost plots above show that, overall, training on longer context lengths does lead to better performance. The size comparison plot (second from right) indicates that larger models are better at long-context modeling all else equal. And finally, the far right plot shows that two non-transformer architectures, RWKV-v5 and Mamba-2.8b, lag the performance of the transformer baseline Llama2-7b model.</p><h4><strong>Our Take</strong></h4><p>Concern about the saturation and insufficiency of existing benchmarks went mainstream this week with the New York Times&#8217;s tech columnist Kevin Roose devoting a <a href="https://www.nytimes.com/2024/04/15/technology/ai-models-measurement.html#:~:text=measurement%20is%20a%20mess%20%E2%80%94%20a,said%20Nathan%20Benaich%2C%20an%20A.I.">1000+ word column</a> to &#8220;A.I.&#8217;s measurement problem&#8221;. On the one hand, he&#8217;s not wrong: claims about model performance are only as good as the benchmarks they&#8217;re based on. On the other hand, this is not a new problem: designing better evaluations has been a central problem in natural language processing for <a href="https://www.cambridge.org/core/services/aop-cambridge-core/content/view/E4330FAEB9202EC490218E3220DDA291/S1351324919000275a.pdf/survey_of_25_years_of_evaluation.pdf">at least the last thirty years</a>.&nbsp;</p><p>This seems like a perennial <a href="https://arxiv.org/abs/2104.02145">problem</a>. We want benchmarks that are neither too easy nor too hard. So as models get better, we naturally need better evaluations. Of course the open-endedness of the types of problems we use and aspire to use LLMs for makes designing good evaluations really difficult. Arguably, a good heuristic for evaluations might be whether or not it&#8217;s a task that can be completed by a simpler algorithm. Simple NIAH tests fail this heuristic; all you need is a regular expression. And the results on RULER bear out the fact that NIAH misses important dimensions of natural language understanding. But much like models, no evaluation is perfect or final for that matter. The synthetic nature of all the non-QA tasks in RULER mean that they don&#8217;t capture slippier notions of human judgment that we might want to capture&#8212;the type of human judgment that methods like RLHF seek to emulate. Luckily there are an increasing number of <a href="https://arxiv.org/abs/2307.11088">other</a> <a href="https://arxiv.org/abs/2305.14196">long</a>-<a href="https://arxiv.org/abs/2309.13345">context</a>, <a href="https://arxiv.org/abs/2308.14508">human</a>-<a href="https://arxiv.org/abs/2311.04939">labeled</a> <a href="https://github.com/OpenBMB/InfiniteBench">evaluations</a> out there aiming to solve exactly that problem.</p><p>&#8211; Cole</p><p>I&#8217;ll never stop harping about the need for better benchmarks and the difficulty of creating standards that make sense for the various things we want to measure&#8212;this is clearly a step in the right direction, as Cole pointed out. I don&#8217;t know that there's much to add, besides that I wish we&#8217;d get better at scoping our claims to the information we actually have available, which crucially depends on the benchmarks we&#8217;ve constructed and their role in what we can actually know about the methods we&#8217;re developing. We <em>really</em> don&#8217;t want more situations like where <a href="https://arxiv.org/abs/1907.06902">this paper</a> argued RecSys was in 2019.&nbsp;</p><p>&#8212;Daniel</p><h2>New from the Gradient</h2><h3><a href="https://thegradientpub.substack.com/p/financial-market-applications-of">Financial Market Applications of LLMs</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n-e3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f97aa2a-81f4-4c95-869b-65316d1ad3af_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n-e3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f97aa2a-81f4-4c95-869b-65316d1ad3af_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!n-e3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f97aa2a-81f4-4c95-869b-65316d1ad3af_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!n-e3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f97aa2a-81f4-4c95-869b-65316d1ad3af_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!n-e3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f97aa2a-81f4-4c95-869b-65316d1ad3af_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n-e3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f97aa2a-81f4-4c95-869b-65316d1ad3af_1024x1024.png" width="304" height="304" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f97aa2a-81f4-4c95-869b-65316d1ad3af_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:304,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n-e3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f97aa2a-81f4-4c95-869b-65316d1ad3af_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!n-e3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f97aa2a-81f4-4c95-869b-65316d1ad3af_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!n-e3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f97aa2a-81f4-4c95-869b-65316d1ad3af_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!n-e3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f97aa2a-81f4-4c95-869b-65316d1ad3af_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/financial-market-applications-of&quot;,&quot;text&quot;:&quot;Read&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/financial-market-applications-of"><span>Read</span></a></p><h3><a href="https://thegradientpub.substack.com/p/sasha-luccioni-ai-climate-change-bias-ethics">Sasha Luccioni: Connecting the Dots Between AI&#8217;s Environmental and Social Impacts</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UflK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8020a011-543e-4a58-b269-52d3ef36eaa3_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UflK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8020a011-543e-4a58-b269-52d3ef36eaa3_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!UflK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8020a011-543e-4a58-b269-52d3ef36eaa3_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!UflK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8020a011-543e-4a58-b269-52d3ef36eaa3_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!UflK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8020a011-543e-4a58-b269-52d3ef36eaa3_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UflK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8020a011-543e-4a58-b269-52d3ef36eaa3_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8020a011-543e-4a58-b269-52d3ef36eaa3_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UflK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8020a011-543e-4a58-b269-52d3ef36eaa3_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!UflK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8020a011-543e-4a58-b269-52d3ef36eaa3_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!UflK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8020a011-543e-4a58-b269-52d3ef36eaa3_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!UflK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8020a011-543e-4a58-b269-52d3ef36eaa3_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/sasha-luccioni-ai-climate-change-bias-ethics&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/sasha-luccioni-ai-climate-change-bias-ethics"><span>Listen</span></a></p><h3><a href="https://thegradientpub.substack.com/p/michael-sipser-theory-of-computation">Michael Sipser: Problems in the Theory of Computation</a></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pru2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2cb509d-70f4-4ec8-8c64-3afe3908e157_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pru2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2cb509d-70f4-4ec8-8c64-3afe3908e157_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!Pru2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2cb509d-70f4-4ec8-8c64-3afe3908e157_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!Pru2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2cb509d-70f4-4ec8-8c64-3afe3908e157_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Pru2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2cb509d-70f4-4ec8-8c64-3afe3908e157_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pru2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2cb509d-70f4-4ec8-8c64-3afe3908e157_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b2cb509d-70f4-4ec8-8c64-3afe3908e157_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pru2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2cb509d-70f4-4ec8-8c64-3afe3908e157_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!Pru2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2cb509d-70f4-4ec8-8c64-3afe3908e157_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!Pru2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2cb509d-70f4-4ec8-8c64-3afe3908e157_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Pru2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2cb509d-70f4-4ec8-8c64-3afe3908e157_1200x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thegradientpub.substack.com/p/michael-sipser-theory-of-computation&quot;,&quot;text&quot;:&quot;Listen&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thegradientpub.substack.com/p/michael-sipser-theory-of-computation"><span>Listen</span></a></p><h2>Other Things That Caught Our Eyes</h2><h3>News</h3><p><strong><a href="https://aiindex.stanford.edu/report/">2024 AI Index Report</a></strong></p><p>Stanford University&#8217;s Institute for Human-Centered Artificial Intelligence released their 7th annual AI Index report. The report covers trends in AI research, public perceptions of AI development, as well as the geopolitical dynamics surrounding its development. The full report is 500 pages, but you can find some of its <a href="https://hai.stanford.edu/news/ai-index-state-ai-13-charts">high-level themes condensed into 13 charts here</a>. These themes include: industry continuing to dominate frontier AI research, investment in AI rising steadily, the insufficiency of existing benchmarks, and scientific progress accelerating due to AI.</p><p><strong><a href="https://theintercept.com/2024/04/10/microsoft-openai-dalle-ai-military-use/">Microsoft Pitched OpenAI&#8217;s DALL-E As Battlefield Tool for U.S. Military</a></strong></p><p>Microsoft last year proposed using OpenAI's DALL-E image generation tool to help the U.S. Department of Defense build software for military operations. The reporting is based on an internal Microsoft presentation entitled &#8220;<a href="https://www.documentcloud.org/documents/24538175-generative-ai-with-dod-data_microsoft">Generative AI with DoD Data</a>,&#8221; which outlines how the Pentagon could utilize OpenAI's tools, including DALL-E and ChatGPT, for tasks like document analysis and machine maintenance. While Microsoft has long had defense contracts, OpenAI only recently acknowledged it would begin working with the Department of Defense, despite previously stating its policy did not allow tools to be used for weapons development or military purposes.</p><p><strong><a href="https://www.theguardian.com/technology/2024/apr/09/artificial-intelligence-bill-copyright-art">New bill would force AI companies to reveal use of copyrighted art</a></strong></p><p>A new bill introduced by California Democratic congressman Adam Schiff to Congress aims to make artificial intelligence (AI) companies disclose any copyrighted material they use to create generative AI models. The legislation, called the Generative AI Copyright Disclosure Act, would require AI companies to submit any copyrighted works in their training datasets to the U.S. Copyright Office prior to the release of any generative model that uses that data. The bill is intended to address concerns that AI systems are using copyrighted art and other content to train their models without proper attribution or compensation to the original creators.</p><p><strong><a href="https://www.washingtonpost.com/technology/2024/04/09/openai-lawsuit-regulation-lawyers/">OpenAI prepares to fight for its life as legal troubles mount</a></strong></p><p>OpenAI is facing a barrage of lawsuits and government investigations: Comedian Sarah Silverman sued OpenAI for allegedly using her memoir to train its AI products without permission; other authors and media companies have also accused OpenAI of copyright infringement; Elon Musk sued the company for diverging from its non-profit mission; government agencies in the US and Europe are investigating OpenAI for potential violations of competition, securities, and consumer protection laws. In response, OpenAI has hired in-house lawyers, posted job openings for legal positions, and retained top US law firms. The company is also considering a political strategy to position itself as a defender of American economic and national security interests against China.</p><p><strong><a href="https://theintercept.com/2024/04/10/microsoft-openai-dalle-ai-military-use/">Microsoft Pitched OpenAI&#8217;s DALL-E as Battlefield Tool for U.S. Military</a></strong></p><p>Microsoft proposed using OpenAI's image generation tool, DALL-E, to assist the Department of Defense in building software for military operations. In a presentation titled "Generative AI with DoD Data,&#8221; the company highlighted potential uses for OpenAI's tools in defense applications such as battle management systems. Microsoft clarified that while they pitched the idea, they had not yet begun using DALL-E for military purposes. OpenAI stated that they were not involved in the Microsoft pitch and had not sold any tools to the Department of Defense.&nbsp;</p><p><strong><a href="https://www.nytimes.com/2024/04/16/technology/microsoft-g42-uae-ai.html">Microsoft Makes High-Stakes Play in Tech Cold War With Emirati A.I. Deal</a></strong></p><p>Microsoft has announced a $1.5 billion investment in G42, an AI company in the United Arab Emirates (UAE). This move is seen as part of the Biden administration's efforts to counter China's influence in the Persian Gulf region and beyond. Under the partnership, Microsoft will allow G42 to sell its AI services, while G42 will use Microsoft's cloud services and adhere to a security arrangement negotiated with the US government. This agreement includes measures to protect the AI products shared with G42 and remove Chinese equipment from G42's operations.&nbsp;</p><p><strong><a href="https://www.washingtonpost.com/opinions/2024/04/10/op-moneyballai/">Will AI transform baseball forever?</a></strong></p><p>The article discusses how artificial intelligence (AI) is transforming the world of baseball, building on the principles introduced in the book <em>Moneyball</em>. Kyle Boddy, the founder of Driveline Baseball, has combined AI with high-speed camera technology to analyze and improve player performance. The use of AI allows for the blending of various data streams to create customized coaching regimens, helping players refine their skills and optimize their performance. Driveline has worked with thousands of professional players, including Tony Gonsolin, who went from a soft-throwing pitcher to an all-star with their help. The article also highlights privacy concerns regarding the use of AI in sports, as teams can gather detailed data on players' performance and potentially use it against them.&nbsp;</p><p><strong><a href="https://www.politico.eu/article/ai-chatbots-spread-falsehoods-about-the-eu-elections-report-finds/">AI chatbots spread falsehoods about the EU election, report finds</a></strong></p><p>According to an analysis by Democracy Reporting International, the EU election is at risk of misinformation spread by AI chatbots. In an experiment, chatbots developed by Google, Microsoft, and OpenAI provided incorrect information about election dates and how to cast a ballot, and shared broken or irrelevant links. The European Commission has ordered tech firms to explain how they are limiting risks to elections connected to their AI tools.&nbsp;</p><p><strong><a href="https://www.nytimes.com/2024/04/15/technology/ai-models-measurement.html">A.I. Has a Measurement Problem</a></strong></p><p>Kevin Roose discusses the measurement problem in AI systems such as ChatGPT, Gemini, and Claude. Unlike other industries, AI companies are not required to submit their products for testing before releasing them to the public. There is no standardized evaluation process for AI chatbots, and few independent groups are rigorously testing these tools. Instead, we have to rely on the claims of AI companies, who often use vague phrases to describe model improvements. While there are some standard tests for assessing AI models, experts have doubts about those tests&#8217; reliability.&nbsp;</p><p><strong><a href="https://techcrunch.com/2024/04/16/metas-oversight-board-probes-explicit-ai-generated-images-posted-on-instagram-and-facebook/">Meta&#8217;s Oversight Board probes explicit AI-generated images posted on Instagram and Facebook</a></strong></p><p>The Oversight Board, a semi-independent policy council for Meta, is investigating how Instagram and Facebook handled explicit, AI-generated images of public figures. In one case, an AI-generated nude image of a public figure from India was reported as pornography on Instagram, but Meta failed to remove it after two reports. The image was only taken down after the user appealed to the Oversight Board. In another case on Facebook, an explicit AI-generated image resembling a US public figure was taken down.&nbsp;</p><p><strong><a href="https://arstechnica.com/tech-policy/2024/04/feds-appoint-ai-doomer-to-run-us-ai-safety-institute/">Feds appoint &#8220;AI doomer&#8221; to run AI safety at US institute</a></strong></p><p>The US AI Safety Institute, part of the National Institute of Standards and Technology (NIST), has appointed Paul Christiano as the head of AI safety. Christiano is a former OpenAI researcher known for his work on reinforcement learning from human feedback (RLHF) and for his prediction that there is a 50% chance AI development could end in "doom." While Christiano's research background is impressive, some fear that his appointment may encourage non-scientific thinking and compromise the institute's objectivity and integrity. Critics argue that focusing on hypothetical existential AI risks may divert attention from current AI-related issues such as ethics, bias, and privacy. Christiano's role will involve monitoring current and potential risks, conducting tests of AI models, and implementing risk mitigations. The leadership team of the safety institute also includes experts in human-AI teaming, international engagement, and human-centered AI.</p><ul><li><p>I just want to point out how funny it is that doomer has become such a commonly used term that it&#8217;s in a headline&#8212;also, I think the label is pretty clickbait-y. Use this as an example of being careful with titles if you plan on doing AI journalism :)</p></li></ul><p><strong><a href="https://www.technologyreview.com/2024/04/09/1091004/china-tech-regulation-harsh-zhang">Why the Chinese government is sparing AI from harsh regulations&#8212;for now</a></strong></p><p>The Chinese government has been sparing AI from harsh regulations, at least for now. Chinese tech giants like Alibaba and Tencent have faced scrutiny for their business practices, including antitrust violations and infringing on privacy and labor rights. However, the government has also provided support to these companies, as they are important contributors to tax revenues and employment. The government's approach to regulating the tech industry in China has been characterized by oscillations between doing too little and doing too much in various sectors, including finance and online tutoring.&nbsp;</p><p><strong><a href="https://www.theregister.com/2024/03/28/ai_bots_hallucinate_software_packages">AI bots hallucinate software packages and devs download them</a></strong></p><p>Generative AI models are hallucinating software packages, and developers are unknowingly downloading and installing them. Bar Lanyado, a security researcher at Lasso security, discovered that an AI model repeatedly recommended a fake package called huggingface-cli, which was then incorporated into the installation instructions for Alibaba's GraphTranslator. To test its potential as an attack vector, Lanyado uploaded a harmless proof-of-concept malware package with the same name, which received over 15,000 authentic downloads. Several large companies were found to either use or recommend the fake package in their repositories. No attacks have been identified yet, but this is another potential attack vector.&nbsp;</p><p><strong><a href="https://www.technologyreview.com/2024/04/11/1090718/household-robots-ai-data-robotics">Is robotics about to have its own ChatGPT moment?</a></strong></p><p>Melissa Heikill&#228; highlights the Stretch robot, which is a mobile robot with a camera, adjustable arm, and gripper, which can be controlled using a laptop and can perform tasks such as brushing hair and playing games. While the current capabilities of robots are limited, the article suggests that with cheap hardware, advances in AI, and data sharing, robots are becoming more competent and helpful. However, there are still challenges to overcome, such as precise control, perception of the surrounding world, and practical physics.</p><h3>Papers</h3><p><strong>Daniel</strong>: The first thing I&#8217;ll mention is <a href="https://x.com/_ddjohnson/status/1781334421141934226">Penzai</a>, a JAX research toolkit from DeepMind that lets you see model internals and inject custom logic. I think this kind of tool is awesome, and I&#8217;m planning to mess around with it more soon.&nbsp;</p><p>DeepMind has a (very long!) <a href="https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/ethics-of-advanced-ai-assistants/the-ethics-of-advanced-ai-assistants-2024-i.pdf">paper</a> out on the ethics of advanced AI assistants.&nbsp;</p><p><a href="https://x.com/tamaybes/status/1780639257389904013">This attempt at replicating the Chinchilla results</a> is interesting, and it&#8217;s a little funny that the researchers have to reconstruct the paper&#8217;s data by extracting the SVG from the paper and parsing out point locations/colors&#8212;this wasn&#8217;t by choice, because the Chinchilla authors haven&#8217;t responded to their request for assistance.&nbsp;</p><p>Finally, I&#8217;ll suggest this <a href="https://arxiv.org/abs/2404.07129">ICL paper</a> and this <a href="https://arxiv.org/abs/2403.08081">preprint</a> on the mechanics of next-token prediction.&nbsp;</p><h3>Closing Thoughts</h3><p>Have something to say about this edition&#8217;s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at <a href="mailto:dbashir@hmc.edu">dbashir@hmc.edu</a> or on <a href="https://twitter.com/spaniel_bashir">Twitter</a>. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this grad-student / volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!</p>]]></content:encoded></item><item><title><![CDATA[Mini-Update #39: 2024 AI Index Report and Autonomous Driving Detection]]></title><description><![CDATA[Stanford publishes its most comprehensive AI index report and researchers develop new methods of driving lane detection using sparse anchors.]]></description><link>https://thegradientpub.substack.com/p/mini-update-39-2024-ai-index-report</link><guid isPermaLink="false">https://thegradientpub.substack.com/p/mini-update-39-2024-ai-index-report</guid><dc:creator><![CDATA[Ather Fawaz]]></dc:creator><pubDate>Tue, 16 Apr 2024 15:01:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Yx-W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6c9a93-3cb9-4e38-bc88-f449bc60523d_911x440.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 39th Mini-Update from the Gradient! This is our exclusive newsletter edition specifically for paying subscribers and is our way to show you our appreciation for your support.</p>
      <p>
          <a href="https://thegradientpub.substack.com/p/mini-update-39-2024-ai-index-report">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>