Gradient Update #5: AI generated art and AlphaFold

Hackers use VQGAN + CLIP to generate art. Deepmind solves a grand challenge in biology.

News Highlight: AI Generated Art

This edition’s news story is AI Generated Art Scene Explodes as Hackers Create Groundbreaking New Tools

Summary. A number of savvy “hackers” have combined VQ-GAN, a state of the art image-generation techniques, with CLIP, a model from OpenAI that can match images with text descriptions. This combination allows for creation of images from simple text descriptions such as “a big red apple”, “an angel dancing on top of a church”, or just about anything you can think of. The quality of the images generated along with the simplicity of generating images from just providing text descriptions has led to an explosion of interest, with many people playing around with the model and posting their results online. This was further enabled by there being simple tutorials like Katherine Crowson’s.

Background. As AI models have become better at dealing with images over the past decade, AI artists have also become more numerous. Early examples from 2015 include the Deep Dream creations of Alexander Mordvintsev and the development of neural style transfer. GANs, which can learn to generate images, have had a particularly large impact on the AI-generated art scene since their emergence. But the AI art scene extends beyond just these generative models, and includes artists like Sougwen Chung, who collaborates with robots trained on data from her previous paintings. A good list of various AI artists can be found here. As AI systems continue to improve, we can expect that artists will find more and more creative ways to use and work with them to produce artworks. Beyond artists, such AI tools can also enable people in other professions to dabble in creating images with AI. Artbreeder (the creator of which we recently interviewed) has enabled this for several years and has a large community, and it seems likely something similar will happen with VQ-GAN+CLIP.

Why does it matter? Democratizing AI has been in vogue as of late, and lowering the barrier to entry can help people with different ideas and use cases apply powerful technologies to their own problems. We’ve seen an explosion in the number of courses teaching deep learning to anyone with a programming background, and organizations such as Eleuther AI have put immense effort into producing open-source versions of models like OpenAI’s GPT-3. The VQ-GAN+CLIP combo makes it easy for anyone with a computer and a few spare minutes to create their own AI-generated art, which is just plain fun and is likely to encourage further development of similar techniques.

What do our editors think?

Daniel: I’m a huge fan of this—there’s no end to what creative folks will come up with, especially when they mess around with AI. The fact that there’s a Colab notebook that almost anyone could pick up and figure out how to use pretty quickly makes this even more exciting. I took some time make some myself.

Andrey: I agree with Daniel! Seeing these colab notebooks is delightful for me, as I have already had a lot of fun messing around with GAN-based art creation over at Artbreeder. Besides enjoying seeing what many other people have been creating on Twitter, I have also share some of my creations:

Paper Highlight: AlphaFold2

This edition’s paper is Highly accurate protein structure prediction with AlphaFold.

Summary DeepMind has released AlphaFold2, the most accurate algorithm that predicts protein structures.  This means that “predicting a protein structure from sequence will be, for all practical purposes, a solved problem.”

Background Proteins, the molecular building blocks of life, are made from sequences of molecules called amino acids. However, the question of determining what a protein looks like in 3D, which is essential for many applications in biology, is extremely difficult, even if the exact sequence of amino acids that make up the protein is known. Previously, the best way to do so was via experimental methods, but that process is extremely tedious. Despite 60 years of effort, less than 0.1% of all known proteins have had their structures experimentally determined. An alternative approach, known as computational protein folding, attempts to determine protein structure computationally. Starting in 1994, a competition known as CASP brought together researchers around the world to predict the structures of new proteins that had recently been verified via experimentation. Unfortunately, until very recently, computational protein folding was not accurate enough to be a stand-in for actual experimentation.

Overview In 2018, AlphaFold1 convincingly won the 13th CASP competition, bringing hope that computer scientists could use advances in AI to speed up research in other fields like biology. While it was a huge result at the time, there was still a sense that there was a long way to go. Last November, DeepMind announced AlphaFold2, a substantial improvement on their previous model and again won the competition by a shockingly large margin.

Among other things, this version leverages recent advances in transformers which, as we’ve discussed in a previous newsletter edition, have radically transformed several areas of AI research. While substantially more end-to-end than most past approaches to computational protein folding, AlphaFold still uses a substantial amount of domain knowledge and custom tailoring in solving the protein folding challenge: it was not as simple as “throwing a big neural net” at the problem and hoping it works.  For example, DeepMind’s neural network leveraged multiple sequence alignments, a well known technique to find similar protein sequences that have already been identified. Also, as protein folding is invariant to rotations and translations, and the structure module used several neat mathematical tricks to directly encode this property into the neural net, resulting in a more efficient use of training parameters.

Why does it matter?  The Oxford Protein Informatics Group blog commented that AlphaFold 2’s release means that “predicting a protein structure from sequence will be, for all practical purposes, a solved problem.” This is because AlphaFold’s accuracies are comparable to the accuracy obtained from experimentally determining the structure of a protein. This, combined with the fact that experimentally determining the structure of a protein is expensive and time consuming, means that AlphaFold will be tremendously useful in speeding up biology research. The fact that DeepMind made AlphaFold open source is a big win for reproducibility.

What do our editors think?

  • Hugh: A decade into deep learning’s emergence as a huge force in the world, I’m often surprised by the number of people who still don’t believe it actually works. Vision, language, games, art and now biology have been radically transformed by deep learning. This is not stopping anytime soon. AlphaFold is a huge achievement, but in my opinion not at all unexpected. It’s just another domain in a long list that AI is conquering, and it definitely won’t be the last.

  • Andrey: The open sourcing of AlphaFold2 is super exciting. I also found the news story DeepMind’s AI for protein structure is coming to the masses interesting, and it turns out there is a nearly-as-good system called RoseTTaFold that was also just published and open-sourced. The paper in question is Accurate prediction of protein structures and interactions using a three-track neural network, and represents a second approach inspired by but developed independently from AlphaFold2. The fact that both systems represent huge advancements in performance for this problem and are both now open sourced bodes well for future progress in this field!

  • Justin: To re-echo some of the sentiment above, I can’t stress how amazing it is that the source code and models are readily available. In this era of vaccine apartheid and intellectual property warefare, it is a breadth of fresh air to see the the scientific community continue to rally around openness and other egalitarian principles.

  • Adi: This is a tremendous win for deep learning and the DeepMind team.  Not only is the accuracy of AlphaFold2 approaching that of experimental techniques, but the system is open sourced, which will be a great resource for the research community.  In my opinion, this is reason for great optimism about deep learning’s ability to solve complex problems when the required data is available.  The CASP contest has existed since 1994, and AlphaFold has produced step-function-like performance improvements in one of the hardest problems in computer science and biology.  I am excited to see the research advances that will build on AlphaFold 2 and the similar RoseTTaFold system.

Guest thoughts

David Baker (senior author on the recent RoseTTaFold Science paper on protein folding): These methods [AlphaFold and RoseTTaFold] are huge progress towards solving the monomeric structure prediction problem. Even bigger challenges are modeling protein assemblies -which carry out most biological functions---and designing new proteins (what my group focuses on).  We are currently making, for example, vaccines and therapeutics that are going into clinical trials by designing completely new proteins, and there is a huge amount to do in this area both in health and medicine, and beyond.

Recently from the Gradient

It’s All Training Data: Using Lessons from Machine Learning to Retrain Your Mind In the first year of my PhD, I started trauma recovery therapy to heal from domestic violence. It mostly consisted of something called “reprocessing sessions''; using a technique called EMDR we would revisit traumatic memories in my life, and try to figure out what beliefs I had that were linked to those events. All of us move through the world with beliefs about what will happen and who we are …

Justitia ex Machina: The Case for Automating Morals Machine Learning is a powerful technique to automatically learn models from data that have recently been the driving force behind several impressive technological leaps such as self-driving cars, robust speech recognition, and, arguably, better-than-human image recognition. We rely on these machine learning models daily; they influence our lives in ways we did not expect, and they are only going to become even more ubiquitous .....

Other News That Caught Our Eyes

EleutherAI Researchers Open-Source GPT-J, A Six-Billion Parameter Natural Language Processing (NLP) AI Model Based On GPT-3 "GPT-J, a six-billion-parameter natural language processing (NLP) AI model based on GPT-3, has been open-sourced by a team of EleutherAI researchers. The model was trained on an open-source text dataset of 800GB and was comparable with a GPT-3 model of similar size.”

Elon Musk just now realizing that self-driving cars are a ‘hard problem’  “Tesla CEO Elon Musk is finally admitting that he underestimated how difficult it is to develop a safe and reliable self-driving car. To which the entire engineering community rose up as one to say, 'No duh.' Or at least that’s it should have happened in a just world."

OpenAI disbands its robotics research team "OpenAI has disbanded its robotics team after years of research into machines that can learn to perform tasks like solving a Rubik's Cube"


Deduplicating Training Data Makes Language Models Better We find that existing language modeling datasets contain many near-duplicate examples and long repetitive substrings. As a result, over 1% of the unprompted output of language models trained on these datasets is copied verbatim from the training data. …. Deduplication allows us to train models that emit memorized text ten times less frequently and require fewer train steps to achieve the same or better accuracy. We can also reduce train-test overlap, which affects over 4% of the validation set of standard datasets, thus allowing for more accurate evaluation.

RMA: Rapid Motor Adaptation for Legged Robots Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. … RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments (video of results here).

ViTGAN: Training GANs with Vision Transformers Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such observation can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). … Empirically, our approach, named ViTGAN, achieves comparable performance to state-of-the-art CNN-based StyleGAN2 on CIFAR-10, CelebA, and LSUN bedroom datasets.


Closing Thoughts
If you enjoyed this piece, give us a shoutout on Twitter. Have something to say about AI generated art or AlphaFold? Shoot us an email at and we’ll select the most interesting thoughts from readers to share in the next newsletter! Also, we’d love to hear your feedback on how the Gradient is doing (one lucky respondent will win an exclusive Gradient hoodie). Finally, the Gradient is an entirely volunteer-run nonprofit, so we would appreciate any way you can support us!