Gradient Update #6: Industry Robotics and Deepmind's XLand

In which we discuss the future potential of robotics in industry and the largest reinforcement learning model ever trained.

Welcome to the sixth update from the Gradient! If you were referred by a friend, subscribe and follow us on Twitter1! We’d also love to hear your feedback on how the Gradient is doing - one lucky respondent will win an exclusive Gradient hoodie!

Recently From The Gradient

Yann LeCun on his Start in Research and Self-Supervised Learning
Anna Rogers on the Flaws of Peer Review in AI
Machine Learning Won't Solve Natural Language Understanding
Machine Translation Shifts Power

News Highlight: State of AI-Powered Robotics in Industry

This edition we have a collection of news stories touching on trends in commercializing robotics:

Summary This set of stories presents a nice picture of the state of commercializing AI-driven robots, both in terms of directions that are seeing rapid expansion as well as efforts that have largely stalled. On the one hand, several applications are seeing much investment and deployment. Examples include AI-enabled robotic manipulation for warehouses or factories, indoor assistance robots for hospitals or hotels, and last-mile delivery robots to take packages from vans to peoples’ doorstep. Smart robotic manipulation is a particularly hot area with companies such as Covariant, Ambi Robotics, Vicarious, Fetch Robotics, and many others working to create robots that are capable of adapting to different contexts and needs instead of just doing pre-programmed motions as is currently standard. 

On the other hand, many applications that have once seemed promising have stalled. This is especially obvious with self-driving cars, with few companies yet offering products to the public and with software that is available on consumer cars still not having much functionality beyond self driving on highways. Other once promising applications such as social robots, autonomous delivery drones, and robots meant to assist in supermarkets have also stalled despite years of investment and development. However, it may only be a matter of time until companies do succeed in cracking these problems.

Lastly, there are cases that are in between: they are not rapidly taking off or promising to change entire industries, but they still have promise for aiding people in doing their work. Examples include robots for helping to clean beaches, a recently revealed AI and robotics powered smart beehive, and even some small robots used in the Tokyo olympics.

Background With the deep learning revolution of the past decade, AI techniques are now powering many software products used every day by millions of people. Facebook and Google have both developed and commercialized techniques to improve search, translation, content monitoring, recommendation systems, and much more. But, despite decades of research on AI for robotics and much recent progress, there are still few similar cases when it comes to hardware products that are powered by AI. However, given these examples of emerging and maturing sectors, this appears to be likely to change over the coming decade, much as AI has become much more common in software over the past decade.

Why does it matter? With an ever-expanding number of warehouses to power e-commerce and millions of factories around the world, AI-enabled robots appear likely to have a huge impact on our world. The same is true for applications that have not yet reached maturity but most likely will in time, such as self driving cars. Besides the effect these technologies will have on our everyday lives, their impact on the economic systems of the world will be profound. With jobs such as truck driving accounting for the livelihood of millions of people in the US, these economic transitions will also have a large effect on society and active governmental measures will need to be taken to help those who will need to transition to new occupations.

What Do Our Editors Think?

Andrey: As a person working on robotics and interested in their presence in society, it has been apparent for many years that most robotics in industry makes little use of AI. However, in recent years I’ve noticed a growing trend of many startups being founded that leverage recent advances in AI to tackle a large variety of problems. Many of these startups are founded by professors who study robotics and are experts on recent advances in AI. I have a growing conviction that intelligent robots will be far more commonly seen in our everyday lives by the end of 2020s, and am personally excited to see that happen.

Hugh: I am slightly less bullish than Andrey about AI and robotics. Recently, OpenAI shuttered their robotics lab, likely because the results produced were not as exciting as their successes in other domains like language modelling. Deepmind, Google Brain, and FAIR have much smaller investments in robotics than in other areas of AI. Successful demos like Boston Dynamics apparently do not use deep learning to train their robots and thus have not managed to scale outside their lab environments.

I suspect that robotics will need a breaththrough on the level of AlexNet, DQN, or AlphaGo to become viable for general industry and while such a breakthrough is never impossible in a field moving as rapidly as machine learning, it is also far from an inevitability. This is in sharp contrast to my views on other areas of machine learning like language, vision, or reinforcement learning, where I believe that results show that existing research directions can push the field significantly further, even without major paradigm shifts.

Daniel: I also remain a bit skeptical, and along with Hugh feel that a breakthrough might be needed to help robotics on its path to potential everyday use. Despite the growing number of startups in the space and the expertise of their founders, deploying robotic systems in myriad settings poses a difficult challenge. We’ve already seen the difficulty in justifying investment in self-driving cars and how both Uber and Lyft got rid of their self-driving units. I don’t doubt there are some differences and we might see more use of robotic systems in restricted settings, but I think the inherent difficulty of the problem, along with the paradigm shift that increased use of robotics systems would represent, could present hard barriers.

Paper Highlight: DeepMind XLand

This edition’s paper is Generally Capable Agents Emerge From Open-ended Play.

Summary DeepMind proposes the game training environment XLand and trains the largest reinforcement learning agent yet—using over 200 billion training steps. For reference, AlphaZero’s chess agent was trained using less than a million training steps.

Overview XLand is a big step forward for multi-agent learning environments on several fronts. Firstly, XLand is a universe of procedurally generated environments instead of a singular static learning environment. This makes it much more suitable for teaching agents how to operate in open ended environments like the real world. Additionally, XLand incorporates the laws of physics into the game environment, making it much more realistic than typical generated game environments, though much more still has to be done before even this simulation approaches the physical dynamics of the real world.

Whereas for models like MuZero/AlphaZero, playing separate games required training separate agents independently on each game, Deepmind’s XLand agent trains simultaneously on a large class of games, producing agents that generalize even to games that they have not seen at train time. Several methods are critical for success in learning in such a large environment. As expected, they make use of the attention / transformer architecture, which has taken over all of deep learning. Additionally, in order to prevent the agent from fixating on games where reward is easier to obtain for less effort, they normalize the rewards across games. This is so that, even in difficult games where rewards are harder to obtain, a 20% improvement in score is valued the same as the same percentage improvement in a game where rewards are much easier to obtain. See a video compilation of their best results, included below.

Why does it matter? Deepmind shows that reinforcement learning can continue to scale to increasingly large and more realistic environments. It’s additional evidence for their Reward is Enough hypothesis that we discussed in a previous newsletter and was undoubtedly on their mind when they published that paper. Additionally, this is one of Deepmind's first major successes of reinforcement learning in the general multi-agent setting. Previously, successes were restricted to either single-player environments (e.g. Atari) or two-player zero-sum games (e.g. Chess, Shogi, Go, or Starcraft). 

What Do Our Editors Think?

Hugh: Like many, I am excited by yet another successful datapoint in the scaling hypothesis. That being said, since so many others have already pointed out the successes, I will instead discuss some potential shortcomings of the paper.

Firstly, I will reiterate my claim from a previous Update newsletter that we “need to rethink reinforcement learning from the ground up” to properly use it in the multi-agent setting. While XLand demonstrated the tremendous scaling potential of reinforcement learning (which I wholeheartedly believe in), I do not believe it provided evidence for the claim that current RL paradigms can learn complex multi-agent interactions. In this paper, Deepmind declared success not when the agent learned to play optimally, but rather when the agents achieved a nonzero reward (and sometimes the agent could not even do this). Part of this is undoubtedly due to the open-ended nature of the environment, but I imagine that part of it is because existing reinforcement learning algorithms often fail to converge in multi-agent settings.

Secondly, the semantic difference between a single game and a universe of generated games is highly unclear. For example, imagine that someone claimed that poker was a “universe” where each new starting hand was a separate minigame. After all, the optimal policy when you hold pocket aces is very different than when you hold a 2-7 offsuit. I also am unsure if the word “generalization” is appropriate in this setting. Continuing the poker analogy, one would not say that a poker agent generalizes to new situations simply because it can correctly play after encountering a sequence of bets that did not appear exactly in the training process.

Nevertheless, I acknowledge that in many reinforcement learning settings, the test set and the train set are literally identical, so nomenclature arguments aside, this result still represents an important step forward for reinforcement learning, especially given that, in my opinion, the success in scaling RL to never-seen-before-heights is uncriticizable.

Andrey: I found this work quite exciting! Although it's largely scaling up existing ideas in works such as Uber's POET, this direction remains promising. Generalizing to novel tasks is still something RL is quite bad at, so this direction of research is definitely interesting. However, as with prior work I remain unsure if it will ultimately be useful for real world applications, given that the simulation environment and the range of possible tasks therein still have to be designed by humans, and there is still a human bottleneck that limits the complexity of possible tasks to be achieved by training agents. Still, for settings like robotic manipulation or self driving cars where extensive work on learning in simulation is already the norm, I could see these ideas becoming very useful.


MIT robot could help people with limited mobility dress themselves "Robots have plenty of potential to help people with limited mobility, including models that could help the infirm put on clothes. That's a particularly challenging task, however, that requires dexterity, safety and speed."

YouTube’s recommender AI still a horrorshow, finds major crowdsourced study "For years YouTube’s video-recommending algorithm has stood accused of fuelling a grab-bag of societal ills by feeding users an AI-amplified diet of hate speech, political extremism and/or conspiracy junk/disinformation for the profiteering motive of trying to keep billions of eyeballs stuck… "

Mozilla Common Voice Adds 16 New Languages and 4,600 New Hours of Speech "The Mozilla Common Voice initiative has released a new, expanded data set featuring 16 new languages - like Basaa and Kazakh - and 4,622 new hours of speech. Mozilla Common Voice is an open-source initiative to make voice technology more inclusive."

AI Wrote Better Phishing Emails Than Humans in a Recent Test "Natural language processing continues to find its way into unexpected corners. This time, it's phishing emails."

Twitter's AI image-crop algo is biased towards people who look younger, skinnier, and whiter, bounty challenge at DEF CON reveals “Engineers at Twitter’s ML Ethics, Transparency and Accountability (META) team sponsored an algorithmic bias bounty competition hosted at this year’s DEF CON hacking conference in Las Vegas and organised by AI Village, a community of hackers and data scientists working at the intersection of machine learning and security. The top three results announced this week revealed Twitter’s saliency algorithm preferred people who appeared more conventionally attractive, English over Arabic, and was more likely to crop out people in wheelchairs.“


Alias-Free Generative Adversarial Networks We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. …  Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.

Beyond BatchNorm: Towards a General Understanding of Normalization in Deep Learning Inspired by BatchNorm, there has been an explosion of normalization layers for deep neural networks (DNNs). However, these alternative normalization layers have seen minimal use, partially due to a lack of guiding principles that can help identify when these layers can serve as a replacement for BatchNorm. To address this problem, we take a theoretical approach, generalizing the known beneficial mechanisms of BatchNorm to several recently proposed normalization techniques. .... Overall, our analysis reveals a unified set of mechanisms that underpin the success of normalization methods in deep learning, providing us with a compass to systematically explore the vast design space of DNN normalization layers.

YOLOX: Exceeding YOLO Series in 2021 In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX. ... We hope this report can provide useful experience for developers and researchers in practical scenes, and we also provide deploy versions with ONNX, TensorRT, NCNN, and Openvino supported.

Pointer Value Retrieval: A new benchmark for understanding the limits of neural network generalization The successes of deep learning critically rely on the ability of neural networks to output meaningful predictions on unseen data -- generalization. Yet despite its criticality, there remain fundamental open questions on how neural networks generalize... In this paper we introduce a novel benchmark, Pointer Value Retrieval (PVR) tasks, that explore the limits of neural network generalization. …. We demonstrate that this task structure provides a rich testbed for understanding generalization, with our empirical study showing large variations in neural network performance based on dataset size, task complexity and model architecture. 

How to avoid machine learning pitfalls: a guide for academic researchers This document gives a concise outline of some of the common mistakes that occur when using machine learning techniques, and what can be done to avoid them. It is intended primarily as a guide for research students, and focuses on issues that are of particular concern within academic research, such as the need to do rigorous comparisons and reach valid conclusions.


Closing Thoughts

If you enjoyed this piece, give us a shoutout on Twitter. Have something to say about this edition’s topics? Shoot us an email at and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! Finally, the Gradient is an entirely volunteer-run nonprofit, so we would appreciate any way you can support us!


Going forward, the Gradient has decided that the Update will remain free for all readers. This is possible only because of your generous support as readers. Instead, we are offering paid subscribers access to discussion threads where you can chat about the latest articles and request future content you would like to see. If you haven’t already, please consider supporting the Gradient to ensure we can continue covering the cutting edge in AI!