Update #54: Will Steam Publish Games Made with AI? and Long Transformers
Legal concerns about AI-generated art assets shed light on what "made with AI" means for games; researchers extend transformers' context length and analyze how transformers use long contexts.
Welcome to the 54th update from the Gradient! If you’re new and like what you see, subscribe and follow us on Twitter :) You’ll need to view this post on Substack to see the full newsletter!
Want to write with us? Send a pitch using this form.
News Highlight: Polygon Asks If Steam Will Publish Games Developed With AI
Summary
As AI-powered tools continue exploding in popularity, there has been a growing debate in both technology and gaming circles about the role of artificial intelligence in game development. This debate is often centered on classifying games as those made with AI and those made without. This issue was recently highlighted by Polygon when they addressed the takedown of an AI-generated game from Valve's Steam, one of the world's most popular online marketplaces for purchasing games. In their statement to Polygon, a Valve spokesperson shares “concerns about the legal status of AI-generated art assets — considering the AI that made them may have been trained on data, including copyrighted art works, that doesn’t belong to the creator of the game.” In their statement, Valve elaborates that they are “continuing to learn about AI, the ways it can be used in game development, and how to factor it into [their] process for reviewing games submitted for distribution on Steam.”
We will argue that this binary classification scheme of games “made with AI” is broad and misleading , as it ignores some of the industry’s biggest successes in AI that have long preceded content generators. These include successes in live game balance, real time strategy recommendations, latency reduction, matchmaking fairness improvements & personalized and discounted shopping experiences.
Overview
To date, almost all of the debate around AI in gaming is framed around the moral and legal implications of content generation models (artwork, voice recordings, dialogue, etc) and how it impacts copyright holders,writers, artists, actors, and developers. And while those debates are necessary, they do not represent the totality of the role of artificial intelligence and machine learning in gaming. When we classify games and quite often reject and ridicule those that are “made with AI”, we are using a very narrow definition that ignores all of the other numerous successes AI and machine learning has had in game development. That’s not to say that developers and gamers should openly embrace games made with generative models more so that we should acknowledge that there is so much more to AI in gaming than generative models and reducing games “made with AI” to those features AI generated content is reductive and can mislead one into missing two decades of unparalleled success the gaming industry had with “simple” bayesian machine learning models and more recent successes using deep reinforcement learning.
One of the earliest success stories dates back to 2004 and the release of the genre-defining first-person shooter, Halo 2. At the time of release, Microsoft researchers released the True Skill Algorithm, a skills-based matching making system that used Bayesian ML Methods in V1 and Reinforcement Learning in V2 to quickly pair similarly skilled players together in fair matchups. Since then Microsoft has entered a 20+ year long effort investing further improvements to speed and fairness in their matchmaking algorithms, resulting in players declaring Halo 2 as “the greatest multiplayer [experience] of all time” and helping Microsoft grow their online services from their initial goal of $1.2B per year to almost $4.7B in 2023.
Why does it matter?
The success of machine learning in matchmaking is not unique to Microsoft or to the TrueSkill Algorithm. When one consults the current Top 10 games on Steam, 7 / 10 of the games have skill based matchmaking systems, and 6 / 7 of the ten use a Bayesian machine learning method for match making.
When one considers use cases outside of matchmaking, we see even more influence of AI and Machine Learning on game development. Some other notable examples of success include
Deep Reinforcement Learning used to continuously balance and tune genre defining Team Fight Tactics (TFT) & Apple’s iPad Game of the Year, Legends of Runeterra
Nvidia’s AI powered image upscaling for low fidelity high definition gaming
Deep Neural Nets for automated game evaluation and testing of first person shooters
8 Frames in 16ms: Rollback Networking in Mortal Kombat and Injustice 2
Real time, data driven recommendations of strategic items in League of Legends
AI powered Esports coaches and tools including SenpAI.gg, Omnic Data, and blitz.gg
Numerous AI powered & gaming focused content moderation tools including Spectrum Labs, ggwp, Hive AI as well as products from traditional technology companies such as Google, Amazon, and Microsoft.
As AI research and tooling become more ubiquitous and easy to use, we can expect more frequent and louder debates about the role of AI in game development. As those debates progress, we wanted to highlight that not every use case for AI in game development is rooted in generative content models. There’s been twenty years of tremendous progress of AI in gaming outside of generative models and there’s no reason to think the next twenty will be any different.
Editor Comments
Justin: As someone who had the privilege of working alongside many of the scientists, engineers, developers, and quality assurance workers whose research and contributions were cited here, this really only begins to scratch the surface of all of the terrific AI, machine learning and data opportunities in gaming.
Research Highlight: Loooooooooooong Transformers
Summary
Many recent works and systems have attempted to increase the context length of Transformers — the number of tokens that they can process at once. Here, we cover recent research advances in improving and analyzing Transformers with long context length.
Overview
The ability to scale Transformers to long sequence lengths is critical to developing powerful pretrained models such as GPT4 or Claude that can process large documents such as entire books or API documentations. Self-attention in standard Transformers scales quadratically in sequence length, so longer context lengths naively require much more compute. Other approaches like CNNs, RNNs, and state space models can scale linearly with context length, but are currently outperformed by Transformers in many tasks, may require more sequential computation, or may be much less expressive.
One approach to scaling Transformers to larger context lengths is by modifying the attention mechanism to be more efficient. The recent work ‘LongNet: Scaling Transformers to 1,000,000,000 Tokens’ proposes a dilated attention mechanism with an approximately linear computation complexity, which allows scaling up to 1 billion tokens. Dilated attention splits keys, queries and values in segments of length ‘w’, followed by sparsification by selecting the rows with an interval ‘r’ within each segment. Intuitively, the segment size ‘w’ trades the globality of attention for efficiency, while the dilation size ‘r’ reduces the computation cost by approximating the attention matrix. The authors employ a mixture of dilated attention to capture both local and long range information.
As seen in the above figure, at context lengths above 2K, LongNet uses many fewer floating point operations and has a better perplexity than standard dense Transformers. Importantly, LongNet does not sacrifice performance at smaller context lengths.
Another approach for developing Long Transformers is to take a model that has been trained on smaller sequences (e.g. 2k), and modify it (often via finetuning) to work on larger sequences (e.g. 8k). This is the approach taken by the Position Interpolation method recently developed by Meta researchers, and the concurrent SuperHOT model by an open source contributor. Naively finetuning on longer sequences takes a while, and naively extrapolating to larger sequence lengths often fails drastically. Instead, these methods interpolate positions: e.g. to increase sequence length by 4x, index 8 in the new longer sequence is given the positional encoding of index 8/4 = 2.
Another work, ‘Lost in the Middle: How Language Models Use Long Contexts,’ from Stanford, empirically studies how language models use long context sequences. Specifically, the authors focus on multi-document question answering, where models analyze multiple documents to locate relevant information and use it to answer questions. Their study reveals an important insight: the performance of models is consistently better when the relevant information is located at either the beginning or end of the context; models perform much worse when required to reason over information in the middle of the input. This observation bears similarities to the serial position effect. In human free-association recall of items from a list, individuals tend to have better recollection of the first and last elements of the list. While surprising, this paves way to use mechanisms inspired from the way humans encode information and potentially develop efficient attention approaches.
Why does it matter?
Large context lengths enable new capabilities. For instance, they allow LLMs to fit more in-context examples into their prompts, process large documents or chat histories, and generate longer sequences. Work on improving the abilities of Transformers for processing and effectively using long contexts can thus enable many applications.
Many commercial or open source LLMs have been developed to have larger context lengths now, such as GPT-4 (32k), Claude 2 (200k), and MPT-7B-StoryWriter-65k+ (65k). OpenAI and Anthropic have not released technical details on how they scaled the context lengths to these levels. Open source communities, academics, and some industry researchers have been working towards scaling up the context length of open models, and publishing their results.
Editor Comments
Sharut: I find this paper fascinating as along the wave of large language models (LLM) for longer context - 1) RMT; 2) Hyena LLM; 3) LongNet, a key question remained - how well do these models use long context. Its intriguing to find how LLMs follow the same patterns of attention as humans where most emphasis is in the beginning or the end. I would love to see follow up work on this which clarifies whether this is because of the training data based on how humans write e.g. abstracts and conclusions carry the key messages of the article.
New from the Gradient
Shiv Rao: Enabling Better Patient Care with AI
Hugo Larochelle: Deep Learning as Science
Other Things That Caught Our Eyes
News
How an AI-written Star Wars story created chaos at Gizmodo “A Gizmodo story on Star Wars, generated by artificial intelligence, was riddled with errors. The irony that the problem happened at a tech publication was undeniable.”
US senators to get classified White House AI briefing Tuesday “The White House will brief senators Tuesday on artificial intelligence in a classified setting as lawmakers consider adopting legislative safeguards on the fast-moving technology.”
Google’s medical AI chatbot is already being tested in hospitals “Google’s Med-PaLM 2, an AI tool designed to answer questions about medical information, has been in testing at the Mayo Clinic research hospital, among others, since April, The Wall Street Journal reported this morning.”
New AI systems could speed up our ability to create weather forecasts “As climate change makes weather more unpredictable and extreme, we need more reliable forecasts to help us prepare and prevent disasters. Today, meteorologists use massive computer simulations to make their forecasts.”
Programs to detect AI discriminate against non-native English speakers, shows study “Computer programs that are used to detect essays, job applications and other work generated by artificial intelligence can discriminate against people who are non-native English speakers, researchers say.”
Inside the White-Hot Center of A.I. Doomerism “It’s a few weeks before the release of Claude, a new A.I. chatbot from the artificial intelligence start-up Anthropic, and the nervous energy inside the company’s San Francisco headquarters could power a rocket.”
US FTC opens investigation into OpenAI “The U.S. Federal Trade Commission (FTC) has opened an expansive investigation into OpenAI, the maker of viral chatbot ChatGPT, on claims that it has run afoul of consumer protection laws by putting personal reputations and data at risk, the Washington Post reported on Thursday.”
China's slow AI roll-out points to its tech sector's new regulatory reality “China has joined the global rush to generative artificial intelligence, boasting close to 80 AI models from firms like Baidu (9888.HK) and Alibaba (9988.HK) and startups attracting almost $14 billion of funding over the last six months.”
China finalizes first-of-its-kind rules governing generative A.I. services like ChatGPT “Chinese regulators on Thursday finalized first-of-its-kind rules governing generative artificial intelligence as the country looks to ramp up oversight of the rapidly-growing technology.”
What should the UK’s £100 million Foundation Model Taskforce do? “The UK government has recently established a ‘Foundation Model Taskforce‘, appointed a savvy technologist named Ian Hogarth to run it, and allegedly allocated ~ £100 million in funding to it.”
Papers
Daniel: I have a few papers I’m excited about this time—first, this paper from Georgia Tech, Meta, and Klagenfurt researchers examines whether LMs could help produce practice material illustrating unhelpful thought patterns. This is really interesting to me because the problem of how language models can assist in human flourishing / mental health seems to mostly veer into ideas like mental health chatbots, and I think this is a neat and creative direction. “Compositionality as Lexical Symmetry” explores a new approach to the problem of deep neural networks’ failure to generalize compositionally from small datasets: they formulate compositionality as a constraint on symmetries of data distributions. Whenever a task can be solved by a compositional model, there exists a corresponding data augmentation scheme imparts compositional inductive bias on any model trained to solve the same task.
Sharut: This paper from Google Research introduces a foundation model for medicine and is really intriguing because of two major reasons 1) Propose an evaluation procedure along several axes of performance such as reasoning, fairness, harm, etc; 2) 1) They develop a simple technique to align LLMs to the safety-critical medical domains. This presents exciting opportunities and the challenges of applying these technologies to medicine.
Another paper I like is HyperDreamBooth. This paper proposes an efficient method that can generate a person’s face across diverse styles using just one sample image in merely 20 seconds! Also the model is 10000x smaller than existing benchmarks, which paves way to make apps/products based on such generative models way cheaper to deploy.
Derek: Recently, I have liked the paper on “Any-dimensional equivariant neural networks.” This work theoretically studies models that have a fixed number of parameters but can process data of varying sizes, e.g. Transformers, DeepSets, and graph neural networks. They give a general process for developing such networks for different data symmetries, and give conditions in which one can expect such networks to generalize across data sizes. I like that they give a general treatment of these networks, as I have studied such networks independently in the past. Also, they present connections to a useful concept studied in mathematics, which gives interesting directions for future work.
Closing Thoughts
Have something to say about this edition’s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at dbashir@hmc.edu or on Twitter. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this grad-student / volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!