Update #32: Chips can't go to China and Cold Diffusion
In which we discuss the US banning sales of Nvidia and AMD chips to China and Cold Diffusion, a method for inverting arbitrary image transformations.
News Highlight: U.S. bans sale of Nvidia and AMD chips to China
On September 1, two major chip manufacturers, Nvidia and AMD, said that U.S. government officials had ordered them to stop selling certain chips to China. This move comes as tension continues to build over Taiwan, where chips for most major companies are manufactured. While Chinese officials have called the order an attempt to impose a “tech blockade,” the U.S. Department of Commerce maintains that the policy will enable advanced AI technologies to be kept out of the wrong hands and deter their use in military settings (Reuters).
AI-based technologies are expected to radically transform defense systems, from assisting agencies with mission planning to using drones for surveillance and monitoring (Deloitte). Since GPUs are a necessity for conducting machine learning research today, embargos on chips can significantly hamper the development of a country’s advanced defense capabilities. Hoping to do exactly this, the U.S. government recently ordered Nvidia to stop selling its A100 and H100 chips to China. Both of these chips are Tensor Core GPUs, designed specifically for accelerating machine learning research.
AMD also shared that it had received new licensing guidelines from U.S. officials that bar it from selling its MI250 chip to China. While AMD said that the ban would have no material impact on its business, Nvidia could lose around $400 Million worth of contracts it had sold in the first quarter of 2022. The company said that they are now working with their clients to appeal for exemptions to the order or find alternate products that can meet their needs. The U.S. government order also includes a ban on sales of these chips to Russia, but neither Nvidia nor AMD currently sells any chips in the country.
A Reuters analysis of various publicly available tenders shows that many leading research institutes in China have a strong demand for Nvidia’s A100 chips. Tsinghua University spent more than $400,000 on two Nvidia supercomputers, each powered by four A100 chips. Additionally, the Institute of Computing Technology, part of the Chinese Academy of Sciences, spent around $250,000 on A100 chips. Consequently, the ban is expected to deliver a strong setback to AI researchers in China and could strongly impact relations between the two countries.
Why does it matter?
This move marks an important escalation in the ongoing clash between the US and China over Taiwan. We are yet to see how China, which sees Taiwan as a breakaway province that should one day be “reunified” with the mainland, is able to retaliate against this ban involving chips manufactured in Taiwan itself. It should also be noted that the chips banned under the current order are some of the most powerful chips sold by Nvidia and AMD that are meant for being used in data centers or training trillion parameter models (such as Large Language Models). This means that while Chinese researchers are indubitably strongly affected by this order, a wide array of high-performance GPUs still remain accessible to them. It will also be interesting to see if this ban lasts long enough for new players to emerge in the Chinese market that can meet the strong demand for such chips.
Daniel: This is a very interesting story and not an entirely surprising one. Hardware systems designed to enable AI applications are increasingly important today, and preventing the exchange of these systems across borders is an intriguing geopolitical move. As we note, high-performance chips will remain available to China. Furthermore, Nvidia and AMD are not the only players, even in the US: Graphcore, Cerebras, SambaNova Systems, and the like all sell AI accelerators while China has its own suite of options, such as Kunlun. What interests me is whether this will leave an open market for other US-based AI accelerator companies, whether those companies will take advantage of this opening, and how this news will affect China’s own AI accelerator landscape.
Research Highlight: Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
This work teases out what is truly necessary for diffusion models, by replacing the standard image degradation of Gaussian noise with arbitrary degradation, including deterministic forms such as blurring and masking. Interestingly, the work shows that many types of degradations can be successfully inverted, and hence generative models can be learned for many different types of degradations. This questions popular assumptions about diffusion models, and promises potentially endless new types of generative models based on inverting degradations besides Gaussian noise.
Many state-of-the-art generative models are heavily reliant on diffusion models, including: DALLE-2, Stable Diffusion, and Imagen. Diffusion models generally consist of a fixed encoding process in which an image is iteratively degraded with Gaussian noise until it becomes purely random standard Gaussian noise, and a learned decoding process in which Gaussian noise is iteratively decoded into an image. To generate an image, we sample standard Gaussian noise, and then iteratively use the learned decoder to denoise this sample into an image. Various theoretical justifications and analyses of diffusion models rely on the specific properties of the Gaussian noise degradations. Further, they require that the degradation is random / probabilistic.
The current work challenges these assumptions / design choices of diffusion models. They consider other types of degradations besides Gaussian noise, including degradations that are completely deterministic. For instance, they consider degradations that at each time step gradually blur, mask out, pixelate, or even linearly interpolate between an input image and a random picture of an animal (referred to as animorphosis). As in standard diffusion models, a neural network is trained to recover the input image, given degraded versions of it. The authors further propose a simple change to naive decoding of a degraded image, which is found to be very useful for inverting the new degradations that they consider (though not necessary for standard Gaussian noise degradations).
In standard diffusion models, the final, fully-degraded distribution is standard Gaussian noise; thus, new images can be generated by sampling standard Gaussian noise and then using the learned decoder to produce a clean image. However, the fully degraded images for different choices of degradation will in general not be distributed as a standard Gaussian. For instance, with more and more blurring, an image will become a solid-colored image, where the color is a 3-dimensional vector given by the channel-wise mean of the input image’s pixel colors. The authors thus fit a simple Gaussian mixture model, so that they may sample this 3-dimensional color vector and then use their learned decoding to generate new images. For the animorphosis transformation, they simply sample a random animal image, then use the decoder to morph that back into a clean image from the input distribution.
Why does it matter?
This exploration of using different types of degradations opens up several possible degrees of freedom for different diffusion models. While inverting the process of Gaussian noise is well-understood, the authors show that learned networks can invert other interesting degradations as well. Further, they show that image generation using these different degradations can give reasonable results. As large-scale diffusion models with simple modifications have already found tremendous success, it is exciting to think of the capabilities of future diffusion models with currently unimaginable changes.
Derek: These new image degradations are pretty cool, and at times funny. I’m excited to see more works in this direction, especially with models trained on large-scale, diverse datasets. There are many parts of diffusion models that can probably be changed a bit; I'm also curious to see more work that modifies the prior / fully degraded distribution, as this work does.
New from the Gradient
Other Things That Caught Our Eyes
Doctors using AI catch breast cancer more often than either does alone “Radiologists assisted by an AI screen for breast cancer more successfully than they do when they work alone, according to new research. That same AI also produces more accurate results in the hands of a radiologist than it does when operating solo.”
Clearview: Glasses With Facial Recognition Are Here—And The Air Force Is Buying “Clearview AI, the facial recognition company backed by Facebook and Palantir investor Peter Thiel, has been contracted to research the use of augmented reality glasses combined with facial recognition for the U.S. Air Force.”
Transformers are Sample Efficient World Models Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games. Our approach sets a new state of the art for methods without lookahead search, and even surpasses MuZero. To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our codebase at this https URL.
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [W]e propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework, which demonstrates several desired properties over existing methods. 1) Probabilistic Mapping. Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected. 2) Realistic Synthesis. MotionDiffuse excels at modeling complicated data distribution and generating vivid motion sequences. 3) Multi-Level Manipulation. MotionDiffuse responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts. Our experiments show MotionDiffuse outperforms existing SoTA methods by convincing margins on text-driven motion generation and action-conditioned motion generation.
Have something to say about this edition’s topics? Shoot us an email at firstname.lastname@example.org and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at email@example.com or on Twitter. If you enjoyed this piece, consider donating to The Gradient via a Substack subscription, which helps keep this grad-student / volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!