Update #57: AI Powers Accessibility Tech and Faith & Fate: Limits of Transformers on Compositionality
AI systems empower paralyzed people to speak and walk again; researchers probe fundamental limitations in transformer-based language models' ability to solve compositional problems.
Welcome to the 57th update from the Gradient! If you’re new and like what you see, subscribe and follow us on Twitter :) You’ll need to view this post on Substack to see the full newsletter!
Want to write with us? Send a pitch using this form.
News Highlight: How Artificial Intelligence Is Powering a Wave of Accessibility Technology
Summary
The New York Times recently highlighted some exciting research coming out of the University of California San Francisco and UC Berkley. Researchers have unveiled a new system which can encode brain signals into text at nearly 80 words per minute, empowering Ann Johnson to speak after nearly two decades of paralysis onset by a stroke. According to the researchers, the system is a “vast improvement over the 14 words per minute that her current communication device delivers.”
This system is just one of a few that we have seen in recent days that aim to use AI to power accessible technology. We have seeen other researchers use auto-encoders helping a paralyzed man walk naturally again as well as the app Be My Eyes which connects people needing sighted support with volunteers and GPT-4 powered visual assistance designed to help people “better navigate physical environments, address everyday needs, and gain more independence.” These are just a few of the ways that researchers hope AI can help create accessible technological solutions.
Overview
According to Kaylo Littlejohn, one study’s lead author, he and his collaborators’ goal was to develop an algorithm that could decode brain activity into audio waveforms, producing vocalized speech. In developing their encoder-decorer system the researchers began by implanting “a paper-thin rectangle of 253 electrodes onto the surface of [Ann’s] brain over areas they previously discovered were critical for speech.” They then worked on developing a training corpus by having Ann repeat different phrases from a fixed length vocabulary. The researchers note the model was trained “to recognize not individual words, but phonemes, or sound units like ‘ow’ and ‘ah’ that can ultimately form any word.” This approach allowed the team to enhance the system’s accuracy and speed it up by a factor of 3. In addition to being able decode text, the system also powers a digital avatar which can intercept signals that would have gone to muscles in Ann’s lips, tongue, jaw and larynx, as well as her face” to power the avatar's expressions as Ann speaks.
While the above article focused on decoding brain signals to help power speech technology, earlier this summer European researchers recently published in Nature about their success developing an encoder which helped the brain communicate with the spinal cord after an injury which left a subject paralyzed and unable to walk. They revealed a brain-spine interface (BSI) which uses an algorithm to calculate the probability of a subject’s intention to move a specific joint. A second independent model predicts the amplitude and direction of the subject’s intended movement. The researchers use an online calibration with the subject to map the brain signals to the intended motions. As a result, the BSI allowed the subject to regain natural control over their legs enabling them to walk, climb and navigate complex terrain. The implant has remained stable for over a year and can be recalibrated in minutes.
Why does it matter?
Ann Johnson is not unique in her struggles to communicate or demonstrate effective use of motor functions after strokes. Currently, researchers estimate that 5 million Americans are living with some form of motor-defecit due to a stroke and 6 million that are paralyzed. The researchers working with her hope that these current breakthroughs will lead to an “FDA-approved system that enables speech from brain signals in the near future”.
Additionally, not all the AI-powered accessible technologies require implants and encoding / decoding brain signals. Some other great examples we’ve seen in recent times include
Machine Translation: Helping doctors and patients communicate and bridge language barriers
Automatic Speech Recognition: Real time captioning for deaf people and AI-based digital humans as sign language interpreters
Digital Virtual Assistants: Providing people who are blind or have low vision with powerful new resources to better navigate physical environments, address everyday needs, and gain more independence
Research Highlight: Faith and Fate: Limits of Transformers on Compositionality
Summary
This paper, led by researchers from the Allen Institute for AI and the University of Washington, studies limitations of Transformer-based language models in solving compositional tasks, which require multi-step reasoning. An analysis of the errors made by LLMs on such tasks gives evidence that these LLMs may be primarily memorizing computations from the training data, and thus the LLMs do not learn multi-step algorithms for solving the problems..
Overview
Researchers have long been interested in equipping Transformer-based language models (LLMs) like GPT-3 and GPT-4 with the ability to "reason" through problems in a step-by-step manner, much like how humans do. However, there seems to be a gap between what these models can do and how humans approach problems that require multi-step reasoning. To explore this, the authors study three example problems that have simple algorithmic solutions: multi-digit multiplication (which can be solved via standard algorithms taught to children), Einstein puzzles (which are also taught to children), and a dynamic programming problem with a straightforward linear-time optimal solution. To analyze the properties of each task, the authors study the computation graph associated with algorithms to solve each problem, which is a directed acyclic graph that specifies the inputs, intermediate computations, and outputs needed to solve a task. The depth, width, and input-size in this graph are measures of the complexity of the algorithm for the task.
The authors make several interesting observations about the performance of GPT-3, ChatGPT, and GPT-4 on these tasks. In the above Figure, we see that the LLMs perform almost perfectly on small problem instances, but then fail on nearly all instances with larger inputs. This also holds for the few-shot setting (where a few examples are given in the prompt), and also for GPT-3 finetuned by the authors on many specific problem examples.
Next, the authors more closely investigate the particular errors made. In particular, they study partial computations, which correspond to the intermediate steps or calculations that are part of the overall algorithm. In each task, the authors observed that the specific calculations LLMs got right were those that appeared more often in their training data. This suggests that LLMs might be recognizing and replicating patterns from the data they were trained on, rather than genuinely solving the problems. Also, on the Einstein puzzle and dynamic programming tasks, LLMs make many so-called restoration errors, where the final answer is correct but intermediate computations are incorrect. Likewise, over 80% of correct answers in the 4-digit by 2-digit multiplication task had an incorrect intermediate computation. This suggests that LLMs may be memorizing solutions, and not actually performing correct reasoning to arrive at the solutions.
Why does it matter?
Current Transformer-based LLMs are very different from humans, in the sense that they are very good at certain tasks that humans find difficult, yet they can completely fail on some tasks that humans find trivial. For instance, multi-digit multiplication and Einstein puzzles can be done by a child after being taught an algorithm, but the tested LLMs fail on these tasks when the problem size is increased even slightly larger than the training problems. In fact, the authors determine that this would be the case even with millions of dollars of finetuning of GPT-3 on small problem instances.
The ways in which LLMs succeed or fail on simple tasks could also give some insights on how they perform some tasks that are difficult for humans. For instance, some of these hard tasks may actually not be that complex, as measured by the size and structure of the computation graph. Or, these hard tasks may be decomposed into a few single-step tasks that are well-represented in the training data.
New from the Gradient
Terry Winograd: AI, HCI, Language, and Cognition
Gil Strang: Linear Algebra and Deep Learning
Other Things That Caught Our Eyes
News
How Nvidia Built a Competitive Moat Around A.I. Chips “Naveen Rao, a neuroscientist turned tech entrepreneur, once tried to compete with Nvidia, the world’s leading maker of chips tailored for artificial intelligence.”
UK to spend £100m in global race to produce AI chips “The government will spend £100m to try to win a toe-hold for the UK in the global race to produce computer chips used to power artificial intelligence.”
‘Very wonderful, very toxic’: how AI became the culture war’s new frontier “When Elon Musk introduced the team behind his new artificial intelligence company xAI last month, the billionaire entrepreneur took a question from the rightwing media activist Alex Lorusso.”
Snapchat is expanding further into generative AI with ‘Dreams’ “Snapchat is preparing to further expand into generative AI features, after earlier launching its AI-powered chatbot My AI which can now respond with a Snap back, not just text.”
Chinese firm launches WonderJourney satellite with AI-powered ‘brain’ “The satellite’s developer says it has an onboard intelligent processing unit that allows it process data without sending it back to ground control.”
As Fight Over A.I. Artwork Unfolds, Judge Rejects Copyright Claim “The case was unique because an inventor named Stephen Thaler listed his computer system as the artwork’s creator, arguing that a copyright should be issued and transferred to him as the machine’s owner.”
AI unlikely to destroy most jobs, but clerical workers at risk, ILO says “Generative AI probably will not take over most people's jobs entirely but will instead automate a portion of their duties, freeing them up to do other tasks, a U.N. study said on Monday.”
Meta confirms AI ‘off-switch’ incoming to Facebook, Instagram in Europe “Meta has confirmed that non-personalized content feeds are incoming on Facebook and Instagram in the European Union ahead of the August 25 deadline for compliance with the bloc’s rebooted digital rulebook, the Digital Services Act (DSA).”
Despite Cheating Fears, Schools Repeal ChatGPT Bans “For decades, Walla Walla High School in the wheat basket of Washington State has maintained an old red wooden barn on campus where students learn a venerable farming skill: how to raise pigs and sheep.”
South Korea's Naver launches generative AI services “South Korean internet giant Naver (035420.KS) unveiled on Thursday its own generative artificial intelligence (AI) tool, joining the frenzy around the new technology initiated by OpenAI's ChatGPT chatbot late last year.”
Papers
Daniel: First, I’ll mention a really exciting paper on assessing the quality of molecule poses from 3D structure-based drug design models that generate molecules and their 3D poses. I’ll let you read the Twitter thread for PoseCheck, but tl;dr the analysis pipeline examines generated poses and protein-drug interactions, allowing direct inspection and judgment of what generative models produce and how well they leverage provided structural context. Another recent article from Professor James Zou’s lab tackles the dearth of publicly available annotated medical images. They harness crowd platforms like Twitter for their de-identified images and clinician knowledge to curate OpenPath, a large dataset of over 200,000 pathology images paired with natural language descriptions. They develop a text-image system called pathology language-image pretraining (PLIP). I’m pretty excited to see further forays into developing AI systems for medical and biological applications, and especially to see see work like PoseCheck that can help improve AI-based drug design.
Closing Thoughts
Have something to say about this edition’s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at dbashir@hmc.edu or on Twitter. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this grad-student / volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!
These little incremental (and not so incremental) improvements in helping folks who were previously unable to communicate? Very, very cool, and likely enormously important in bringing a previously excluded group of thinkers-- some of whom are very good at thinking-- into the collective conversation.