The Gradient Update #17: OpenAI's InstructGPT is less toxic than GPT-3, Facebook introduces data2vec to tackle learning from speech, vision, and language
In which we cover OpenAI's new paper on fine-tuning GPT-3 with human feedback, Facebook's new paper that presents a single algorithm that can work on different modalities, and more!
Welcome to the 17th update from the Gradient! If you were referred by a friend, subscribe and follow us on Twitter!
News Highlight: OpenAI rolls out new text-generating models that it claims are less toxic
Summary
Researchers at OpenAI developed a new family of models, called InstructGPT, which they say are less likely to generate problematic content and more closely align with a user’s intent. InstructGPT goes further than previous efforts, using “reinforcement learning from human feedback” (RLHF) to make GPT-3 more accurate and less toxic.
Background
If you open up GPT-3’s API and ask it to generate text, you might see a message pop up that warns you about sensitive content. Given that it has been trained on large swaths of text on the internet, it is no surprise that the original GPT-3 may often echo misinformation, conspiracies, racism, and other toxic text. For example, the model tends to suggest the word ‘Islam’ near terrorism. A study from last year showed that the GPT-3 disproportionately associates Muslims with violence. Indeed, when the researchers fed the model the same prompts but replaced “Muslims” with “Christians,” the AI provided violent associations only 20% of the time as opposed to 66%.
OpenAI had already tried mitigating these issues in GPT-3 by filtering content and even fine-tuning it on a “values-targeted” dataset. InstructGPT takes a slightly different direction with RLHF. Their method uses a dataset of human-written demonstrations on prompts submitted to OpenAI’s API and prompts written by human labelers to train GPT-3 models. A second dataset includes human-labeled comparisons between GPT-3 outputs on a larger set of prompts. A “reward model” is trained on this second dataset to predict which GPT-3 outputs the labelers would prefer, then used to fine-tune the GPT- models.
The researchers had labelers rate the quality of outputs between the new model and GPT-3 and found that labalers “significantly” preferred InstructGPT to GPT-3. OpenAI claims InstructGPT more consistently wrote truthful information and obeyed instructions.
Why does it matter?
As models like GPT-3 come into broad use, ensuring they are aligned with human intentions will become more and more important. Reducing toxicity in LLM output is just one way in which techniques like RLHF might help ensure these models aren’t causing harm. Work remains to be done: VentureBeat points out that the researchers didn’t investigate how labelers could have introduced bias into InstructGPT, and that RLHF leaves the problem of mitigating toxicity in multi-modal models unsolved.
Editor Comments
Daniel: I don’t know that this is “the best alignment paper in the world so far” as Sam Altman has claimed on Twitter, but this is certainly very important work. The VentureBeat article points out that there is plenty of work left to be done in mitigating bias for multi-modal models and investigating potential limitations of RLHF. What I think is important, and exciting for alignment researchers in particular, is the integration of an alignment technique into a widely-used product. This doesn’t mean all alignment research will end up in user-facing products, but does present an interesting case of how it can address immediate concerns.
Andrey: I agree with Daniel, this is important work. As I said in my Last Week in AI summary - “This is very cool! While not especially novel in terms of its technical approach, the demonstration of this technique working on a model as large as GPT-3 is quite significant and likely to influence similar future work. A lot of work has demonstrated the potential negative ramifications of using models such as GPT-3, so it’s also significant that OpenAI itself undertook this research and has even replaced GPT-3 with InstructGPT in its commercial offerings.”
Paper Highlight: Data2vec: A General Framework for Self-Supervised Learning in Speech, Vision, and Language
Summary
Since the advent of models like CLIP and DALL-E, more and more research is being done on training massive multi-modal neural nets that can understand concepts in multiple modalities (such as images and text). As with massive language models such as GPT-3, a popular approach being researched is the training of transformers via self-supervised learning so as to to learn representations of data without resorting to massive amounts of labeling.
While the general idea of self-supervised learning (convert unlabeled data to labeled data by generating the labels automatically from the data) is indifferent to modality as the authors of this paper point out, “the actual algorithms and objectives differ widely because they were developed with a single modality in mind.” So, this paper introduces a framework that uses the same learning method for either speech, NLP or computer vision, which they call data2vec. The paper summarizes how this works as follows:
“data2vec is trained by predicting the model representations of the full input data given a partial view of the input. We first encode a masked version of the training sample (model in student mode) and then construct train-ing targets by encoding the unmasked version of the input sample with the same model but when parameterized as an exponential moving average of the model weights . The Target representations encode all of the information in the training sample and the learning task is for the student to predict these representations given a partial view of the input.”
While data2vec introduces a modality-agnostic learning regime, it does use modality-specific features and masking strategies. The authors observe that modality impacts what masking strategies will make the learning task challenging. Future work might involve self-supervised learnings with the data2vec mechanism and the Perceiver architecture, which “can directly operate on the raw data from different modalities without modality-specific feature encoders.”
Why does it matter?
Processing data in a more general way that admits information from multiple modalities seems like an important step towards developing AI systems that are able to “reason” more generally. The fact that data2vec achieves competitive or state of the art performance on benchmarks in all its modalities further underscores the method’s potential.
Editor Comments
Daniel: While we’re still far away from any sort of “general” AI, many would still like to develop AI systems that “experience” the world in the way humans do, being able to process information from visual, textual, and speech modalities. Besides being able to perform tasks in different domains, a system like data2vec hints at ways to allow learning algorithms to develop representations that represent more abstract concepts that are not tied to specific forms of representation. This represents a promising step towards more general AI systems whose ability to process the world might more closely resemble our own. I think there’s still a lot of distance from here to there, and it’s hard to say precisely what direction AI “needs” to take to realize many of its aspirations, but I hope we’ll see more work on this front.
Andey: I’m excited to see this! Going beyond single-modality models is essential to solving more complex tasks that we have thus far. DeepMind’s Perceiver and Perceiver IO models have shown that a single model can be adapted to various forms of data, and now this paper highlights a single training approach as well. I think we’ll be seeing a lot more of this sort of thing in the comind years.
From Qiantong Xu:
As Andrej Karpathy pointed out - the ongoing consolidation in AI is incredible. Decades ago, researchers focusing on different area (e.g. vision, speech, natural language, reinforcement learning) can hardly understand papers from other areas. While about 10 years ago, all of these areas started to transition to machine learning, especially to deep neural nets, even though the architectures were diverse. In the recent years, even the neural net architectures across all areas are starting to look identical - Transformer tends to dominate everything.
Standing upon the shoulders of the promising research progresses in self-supervised learning, we are trying to consolidate the training framework/algorithms, that can work across different domains. Data2vec is a simple pretraining algorithm that reaches SOTA on several major benchmarks in speech, vision and NLP. It enables the model to learn meaningful representations, regardless the format of input. We hope this work can benefit the future multi-modality AI research, and further the consolidation of AI infrastructure, that will in turn speed up the the overall progress of AI.
New from the Gradient
Engaging with Disengagement
Percy Liang on Machine Learning Robustness, Foundation Models, and Reproducibility
Other Things That Caught Our Eyes
News
MIT robot could help people with limited mobility dress themselves "Robots have plenty of potential to help people with limited mobility, including models that could help the infirm put on clothes. That's a particularly challenging task, however, that requires dexterity, safety and speed."
U.S. ‘mass surveillance’ company challenges B.C. privacy watchdog order “A global 'mass surveillance' company ordered by B.C.'s privacy watchdog to stop collecting British Columbians' images is challenging that order in B.C. Supreme Court. Clearview AI claims B.C.”
AI2 shows off an open, Q&A-focused rival to GPT3 “OpenAI’s impressive AI language model GPT-3 has plenty of things going it, but with 175 billion parameters no one would claim it’s particularly streamlined.”
Tesla AI Director: ‘I believe ‘Tesla Bot’ is on track to become the most powerful AI development platform’ “Tesla’s Director of Artificial Intelligence, Andrej Karpathy, says that he believes ‘Tesla Bot’ is “on track to become the most powerful AI development platform.”
Robot performs surgery without help from humans “The machine successfully performed keyhole surgery on a pig, attaching organs in a range of different animals. And it did so without the help of a human, for the first time.”
IRS Will Require Facial Recognition Scans to Access Your Taxes “Online tax filers will soon be required to submit a selfie to a third-party identity verification company using facial recognition tech in order to file their taxes or make IRS payments online. Starting this summer, users with an IRS.”
Papers
Third Time's the Charm? Image and Video Editing with StyleGAN3 “StyleGAN is arguably one of the most intriguing and well-studied generative models, demonstrating impressive performance in image generation, inversion, and manipulation. In this work, we explore the recent StyleGAN3 architecture, compare it to its predecessor, and investigate its unique advantages, as well as drawbacks. In particular, we demonstrate that while StyleGAN3 can be trained on unaligned data, one can still use aligned data for training, without hindering the ability to generate unaligned imagery.”
Few-shot Learning with Multilingual Language Models “Large-scale autoregressive language models such as GPT-3 are few-shot learners that can perform a wide range of language tasks without fine-tuning. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning in more than 20 representative languages.”
POTATO: exPlainable infOrmation exTrAcTion framewOrk “We present POTATO, a task- and language independent framework for human-in-the-loop (HITL) learning of rule-based text classifiers using graph-based features. POTATO handles any type of directed graph and supports parsing text into Abstract Meaning Representations (AMR), Universal Dependencies (UD), and 4lang semantic graphs. A streamlit-based user interface allows users to build rule systems from graph patterns, provides real-time evaluation based on ground truth data, and suggests rules by ranking graph features using interpretable machine learning models. Users can also provide patterns over graphs using regular expressions, and POTATO can recommend refinements of such rules. POTATO is applied in projects across domains and languages, including classification tasks on German legal text and English social media data. All components of our system are written in Python, can be installed via pip, and are released under an MIT License on GitHub.”
Variational Neural Cellular Automata “In nature, the process of cellular growth and differentiation has lead to an amazing diversity of organisms -- algae, starfish, giant sequoia, tardigrades, and orcas are all created by the same generative process. Inspired by the incredible diversity of this biological generative process, we propose a generative model, the Variational Neural Cellular Automata (VNCA), which is loosely inspired by the biological processes of cellular growth and differentiation. Unlike previous related works, the VNCA is a proper probabilistic generative model, and we evaluate it according to best practices. We find that the VNCA learns to reconstruct samples well and that despite its relatively few parameters and simple local-only communication, the VNCA can learn to generate a large variety of output from information encoded in a common vector format. ”
Tweets
Closing Thoughts
Have something to say about this edition’s topics? Shoot us an email at gradientpub@gmail.com and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! If you enjoyed this piece, consider donating to The Gradient via a Substack subscription, which helps keep this grad-student / volunteer-run project afloat. hanks for reading the latest Update from the Gradient!