Update #2: Killer Robots and Diffusion Models

as well as other news, papers, and memes, and tweets in the AI world!

Welcome to the second monthly Update from the Gradient1! If you were referred by a friend, we’d love to see you subscribe and follow us on Twitter!

News Highlight: Killer Robots?

A recent UN report about the Lybian civil war suggested that an autonomous AI-piloted drone may have been used in fighting. Specifically, it stated: “Logistics convoys and retreating HAF were subsequently hunted down and remotely engaged by the unmanned combat aerial vehicles or the lethal autonomous weapons systems such as the STM Kargu-2 [a quadcopter] and other loitering munitions. The lethal autonomous weapons systems were programmed to attack targets without requiring data connectivity between the operator and the munition: in effect, a true “fire, forget and find” capability.“ This led to a variety of headlines that ranged from reasonable to sensationalist: 

Drones may have attacked humans fully autonomously for the first time

A.I. Drone May Have Acted on Its Own in Attacking Fighters, U.N. Says

The Age of Autonomous Killer Robots May Already Be Here

Autonomous drone attacked soldiers in Libya all on its own

Killer drone ‘hunted down a human target’ without being told to

The Verge’s own report have autonomous robots started killing in war? took a close and rightly noted that this excerpt from the UN reports leaves a lot of room for ambiguity; it does not specify that the Kargu-2 was actually operating autonomously. Further, “loitering munitions” (essentially devices that hover in the air and dive to detonate if a target is detected) have long been used in warfare. However, it is heavily implied that in this case machine learning algorithms embedded in the device were used to steer it toward its targets, as in the marketing material of the drone.

It’s hard to say how big a deal this drone’s involvement was, even if it targeted people without being operated by a human. But, it is still a good reminder that now is a good time to push for regulation to define and limit the use of AI-controlled weapons. 

Background: This story touches on the broader topic of “lethal autonomous weapon systems (LAWS)”, which are different forms of weaponry that can operate without human control or oversight. While easy to imagine, essentially no such systems have been used in practice or even demonstrated. The closest to an existing and deployed instance of LAWS is the SGR-A1, a sentry gun that has been utilized by South Korea to guard its border with North Korea. In 2007 it was reported that this system has the capability to begin firing on its own if a human is detected and does not stop following verbal commands, although it is not clear if this functionality has been put to use. 

But, it is only a matter of time until such technology is developed. In 2015 more than 3000 experts signed “Autonomous Weapons: an Open Letter from AI & Robotics Researchers”, which warned that “Artificial Intelligence (AI) technology has reached a point where the deployment of such systems is — practically if not legally — feasible within years, not decades, and the stakes are high: autonomous weapons have been described as the third revolution in warfare, after gunpowder and nuclear arms.” and that “in summary, we believe that AI has great potential to benefit humanity in many ways, and that the goal of the field should be to do so. Starting a military AI arms race is a bad idea, and should be prevented by a ban on offensive autonomous weapons beyond meaningful human control.” 

No regulatory laws exist for this technology either at the domestic level (for the USA) or international level. Many countries and nonprofit organizations, such as International Committee for Robot Arms Control, Campaign to Stop Killer Robots, Human Rights Watch, have called for regulation or an outright a ban on LAWS.

What do our editors think?

Andrey: Cute the terminator jokes! Jokes aside, if anything I am surprised that this sort of thing has not happened sooner. The marketing materials for this drone promote its AI-based capability to do exactly what was reported in these stories. While it does set a precedent, I am more worried about what happens with heavy investment in such technologies from national militaries in the near future. As outlined in The Rise of “Killer Robots” and the Race to Restrain Them, the militaries of the US, China, Russia, and others are already developing various more advanced LAWS, and it is only a matter of time until they are ready for use. Personally, I support pushing for regulation of such technology both domestically and internationally.

Hugh: I am very concerned about a world where lethal autonomous weapons are regularly used by militaries around the world and think that this story was actually underplayed relative to its true importance. Like many others in the AI community, I support international regulation of this technology, but I’m beginning to suspect that this is just the tip of the iceberg of what the AI community—and the world—needs to push for to ensure a safe future. Regulation is just the first step: it alone will not be enough without credible incentives to deter the usage of AI weapons. But nevertheless it is an important first step.

Paper Highlight: Diffusion Models

Background: Since their invention in 2014, Generative Adversarial Networks (GANs) have been one of the most prominent paradigms for generative modelling. Several other approaches have also been explored, including VAEs and flow models, but generally underperform GANs in terms of the quality of images generated.

In 2019, researchers from Stanford University proposed a new paradigm for generative modelling which is now commonly referred to as diffusion models, which learn to reverse the process of transforming data into noise—and thus, create data from noise. Some say that diffusion models are the new contenders to GANs in image generation.

What are diffusion models? In a single sentence, diffusion models are trained by progressively adding more and more Gaussian noise to images from the training set and learning to “denoise” these noisy images. Adding enough noise removes all of the information encapsulated in an image, and if a model can reverse the process entirely, it can succeed in creating images from pure white noise—generating from nothing.

Behind the scenes, instead of learning to generate images via a minimax game (GANs) or directly learning the probability distribution of the data (VAEs and normalizing flows), diffusion models attempt to match the gradient of the data distribution at all points in the space using a process called denoising score matching. This process minimizes the Fisher divergence between the model and the real data distribution, as if two probability distributions have the same gradient everywhere, they are the same probability distribution.

Why does this matter? Despite their widespread success in generating high-quality images, GANs still suffer from diversity issues of failing to learn the entire distribution. Additionally, their training process is notoriously unstable and suffers from both practical and theoretical convergence issues.

Unlike GANs, diffusion models probably learn to cover the whole distribution (measured in bits per dim for images). Unlike VAEs and flow models, they also generate pictures that are competitive in quality with GANs (this paper). One downside is that they are considerably more complex to understand and implement, and that they are (currently) very slow at generating images. However, as more research is done in this direction, this may change.

What do our editors think?

Hugh: GANs and VAEs have historically each had their merits in the generative modelling community, with GANs typically generating higher-quality images while VAEs have had better diversity of images generated. Having a model that can generate higher quality images AND cover the entire distribution is a huge victory and very promising for future work. Given that a recent advance in training diffusion models recently was awarded an Outstanding Paper Award at ICLR 2021, it’s clear that the broader machine learning community thinks so as well.

Justin: Any approach that attempts to close the gap between in domain training samples and out of domain reality is something that immediately warrants attention from me. Let’s hope folks can continue to build upon this work and speed things up considerably. 

Guest Thoughts

Yang Song (one of the grandfathers of diffusion models): I’m excited about diffusion/ score-based models because of

  • 1) high sample quality. They are now able to outperform GANs on large scale image generation (ImageNet, FFHQ, etc), without requiring adversarial training which is notoriously unstable to tune.

  • 2) Competitive likelihood. You can compute the exact log-likelihood with numerical ODE solvers for evaluation, or compute a lower bound of it that allows efficient training. Our recent work shows you can have much higher likelihoods than existing flow models with score-based diffusion models.

  • 3) Solving inverse problems. Because the Bayes' rule is greatly simplified for score functions, you can specify or learn the score function of a posterior distribution easily. It facilitates inverse problem solving and can be applied to various applications, such as image inpainting, super-resolution, and colorization, without re-training a score-based model.

  • 4) Connections to other fields. For example, there exists an ODE formulation that turns a score-based model into a continuous normalizing flow model. The diffusion process is also strongly connected to regularized optimal transport (Schrodinger bridge).

Alexia Jolicoeur-Martineau (author of The New Contender to GANs?): I think this is a very promising set of approaches as it does a better job at ensuring full-distribution coverage than other generative models, while producing higher quality data due to the iterative refinement aspect.

Other News That Caught Our Eye

The Costly Pursuit of Self-Driving Cars Continues On. And On. And On. Many in Silicon Valley promised that self-driving cars would be a common sight by 2021. Now the industry is resetting expectations and settling in for years of more work.

Tesla announces transition to 'Tesla Vision' without radar, warns of limitations at first Tesla today announced the official transition to “Tesla Vision” without radar on Model 3 and Model Y. In the process, the automaker warns of some limitations on Autopilot features at first.

Anthropic is the new AI research outfit from OpenAI’s Dario Amodei, and it has $124M to burn As AI has grown from a menagerie of research projects to include a handful of titanic, industry-powering models like GPT-3, there is a need for the sector to evolve — or so thinks Dario Amodei, former VP of research at OpenAI, who struck out on his own to create a new company a few months ago.

Machine learning is booming in medicine. It's also facing a credibility crisis The mad dash accelerated as quickly as the pandemic. Researchers sprinted to see whether artificial intelligence could unravel Covid-19's many secrets - and for good reason. There was a shortage of tests and treatments for a skyrocketing number of patients.

Google says it's committed to ethical AI research. Its ethical AI team isn't so sure. Six months after Timnit Gebru left, Google's ethical artificial intelligence team is still in a state of upheaval."

Another Day on ArXiv

A Survey of Transformers Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers.

Tabular Data: Deep Learning is Not All You Need In this paper, we explore whether these deep models should be a recommended option for tabular data, by rigorously comparing the new deep models to XGBoost on a variety of datasets. In addition to systematically comparing their accuracy, we consider the tuning and computation they require. Our study shows that XGBoost outperforms these deep models across the datasets, including datasets used in the papers that proposed the deep models. We also demonstrate that XGBoost requires much less tuning.

Decision Transformer: Reinforcement Learning via Sequence Modeling In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling... By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation In this work, we aim to build a many-to-many translation system with an emphasis on the quality of non-English language directions. Our intuition is based on the hypothesis that a universal cross-language representation leads to better multilingual translation performance. To this end, we propose mRASP2, a training method to obtain a single unified multilingual translation model. mRASP2 is empowered by two techniques: a) a contrastive learning scheme to close the gap among representations of different languages, and b) data augmentation on both multiple parallel and monolingual data to further align token representations.

When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations This paper investigates ViTs and MLP-Mixers from the lens of loss geometry, intending to improve the models' data efficiency at training and generalization at inference. Visualization and Hessian reveal extremely sharp local minima of converged models. By promoting smoothness with a recently proposed sharpness-aware optimizer, we substantially improve the accuracy and robustness of ViTs and MLP-Mixers... The resultant ViTs outperform ResNets of similar size and throughput when trained from scratch on ImageNet without large-scale pretraining or strong data augmentations.

Fun Tweets We Saw

Closing Thoughts

If you enjoyed this piece, give us a shoutout on Twitter!

Have something to say about killer robots or diffusion models? Shoot us an email at gradientpub@gmail.com, and we’ll select the most interesting thoughts from readers to share in next week’s newsletter!

Finally, the Gradient is an entirely volunteer-run nonprofit, so we would appreciate any way you can support us!


For the next few months, the Update will be free for everyone. Afterwards, we may consider posting these monthly updates as a perk for paid supporters of the Gradient! Of course, guest articles published on the Gradient will always be available for all readers, regardless of subscription status.