Gradient Update #21: Russia's AI-powered Drones and RB2
In which we discuss Russia's use of AI-powered drones in Ukraine and a new benchmarking system for robotic manipulation.
Welcome to the 21st update from the Gradient!
If you were referred by a friend, subscribe and follow us on Twitter!
News Highlight: Russia's Killer Drone in Ukraine Raises Fears About AI in Warfare
Summary
Photographic evidence suggests that Russia is using artificial intelligence powered drones in the 2022 invasion of Ukraine. These drones are rather new additions to the Russian military arsenal, and similar drones have been used in recent conflicts around the world. Although humans still make important decisions in drone operations, more sophisticated AI systems may enable levels of autonomy that change how drone warfare looks in the future.
Background
Drones have become more and more widely used in warfare in the last few years. For instance, Russia has used drones in the Syrian war, and Azerbaijan has used drones in the conflict against Armenia in Nagorno Karabakh. An autonomous drone with lethal capabilities was recently used in a 2020 battle in Libya. While general unmanned aerial vehicles (UAVs) have been used in warfare for many years (a large proportion of the United States military’s airplanes were UAVs by 2012), recent advances in AI technology may significantly increase drones’ capabilities.
Indeed, the new drones by Russian aerospace company ZALA Aero purportedly use object detection and recognition. As the past decade of research in machine learning (and particularly deep learning) has allowed for tremendous advances in vision capabilities, it should be no surprise that drone capabilities have also improved.
Still, this does not mean that we are already witnessing widespread use of fully autonomous AI-driven weapons. Thus far, it appears that Russia is not using these drones widely in the invasion of Ukraine. According to Wired, Professor Michael Horowitz of the University of Pennsylvania notes that oftentimes the autonomous capabilities of drones are used for “flight corrections and maneuvering…”, so this is “not autonomy in the way the international community would define an autonomous weapon.”
Why does it matter?
As AI advances, so too do technologies powered by AI. In its current state, AI can help with various aspects of weapons, including flight corrections and target identification. Still, the current use of AI in weapons is limited. For instance, the US military does not currently have lethal autonomous weapon systems, though “some senior military and defense leaders have stated that the United States may be compelled to” develop such systems if adversaries do. In the future, AI-powered weapons may continue to improve in capabilities unless some restrictions are developed and agreed upon. Various organizations such as the Campaign to Stop Killer Robots and the Future of Life Institute are working towards adoption of such restrictions.
Editor Comments
Derek: It definitely worries me that advances in AI could lead to advances in violent weapons, especially given that desired drone capabilities like object recognition are very actively studied by many researchers around the world. Likewise, the now widely studied molecule generation methods can be simply inverted from generating non-toxic molecules to generating highly toxic chemical weapons. I am not an expert on international law, but I would not be surprised if international standards were formed around the use of AI in warfare, perhaps with similar scope to the treaties governing nuclear or chemical weapons.
Daniel (now officially 24 and very much no longer able to relate to Taylor): While I agree with Derek and worry a lot about what AI-enabled warfare/weaponry looks like, I also haven’t made a conclusion about where I think that will take us. A year or two ago, I heard a really interesting perspective that claimed AI-enabled weaponry could lead to a state in which warfare is conducted with essentially no human participation. I’m not sure exactly what this picture looks like, and that doesn’t speak to questions of how autonomous weapons might be used otherwise, but I am curious if there’s any way in which the development of these systems could lead to fewer human lives being lost in war.
Andrey: The current state of AI-enabled weaponry, and especially lethal autonomous weapon systems (sometimes referred to as “killer robots”) is not as far along as many might imagine. Still, it’s not surprising that many countries are investing heavily into R&D to change this state of affairs. Efforts such as The Campaign to Stop Killer Robots have been pushing for regulation to be passed in advance of ‘killer robots’ being used in war, but have thus far been unsuccessful; hopefully they will achieve some progress in the coming years.
Paper Highlight: RB2: Robotic Manipulation Benchmarking with a Twist
Summary
Benchmarks are tools used to “compare algorithms using objective performance metrics” with two defining features: they are widely useful for various research groups and allow researchers to reproduce results reliably. However, the field of robotics suffers from many challenges in developing reliable and scalable benchmarks. In this paper, researchers from NYU, CMU, WPI, UIUC and Facebook AI introduce a novel methodology for benchmarking tasks in robotic manipulation that attempts to address these challenges. The benchmark incorporates a two-step structure, in which a lab can first run experiments to generate ‘local rankings’ between models, and then many such local rankings are pooled from different labs to create a ‘global leaderboard’. Owing to this structure, the benchmark is named Ranking-Based Robotics Benchmark (RB2), and pushes the envelope on establishing what ‘state-of-the-art’ for robotics can look like.
Overview
Research in AI today is fast-paced, as labs across the world attempt to push the limits of machine intelligence. Researchers have developed more and more ingenious solutions to problems like object detection or robotic manipulation, proposing algorithms that they claim are state of the art. A natural solution to compare algorithms developed by different labs is to create a standard benchmark dataset and repeat a given experiment with different algorithms on the same data. In many sub-domains of AI, such as computer vision, benchmarks have seen great success by enabling rapid development of models in tasks such as semantic segmentation, object detection, and generative modeling (think ImageNet, CIFAR, KITTI, etc.). However, other domains such as robotics have not yet found a robust way to assess an algorithm’s performance. This primarily stems from the challenge in creating an experimental setup for robots that can be replicated in any lab, but also does not limit the task at hand severely.
The authors propose RB2, a benchmarking system for robotics, that aims to resolve these issues. RB2 currently includes 4 tasks - pouring, scooping, insertion, and zipping. These tasks are sourced from the Southampton Hand Assessment Procedure (SHAP), which is a “standard test for assessing dexterity in occupational therapy via daily-living manipulation tasks.” The authors also provide modular implementations for 5 standard baselines (Open Loop Imitation, Closed Loop Imitation, Neural Dynamic Policies, Recurrent Neural Networks and Offline Reinforcement Learning) that can be used with a variety of robots and sensors. With a general task specification and access to baselines, each lab can then generate their “local rankings” using the benchmark that are specific to their hardware and their experimental setup. Labs can then submit their results to the benchmark, which then aggregates results from various labs using the Plackett-Luce method to generate a set of “global rankings” for the baselines for each of the 4 tasks. This process allows different labs to choose which robots and grippers to use, or how to set up their experiment, yet still adhering to the benchmarking system to contribute to a global leaderboard.
Why does it matter?
Across various AI disciplines, researchers continuously prioritize the development of various benchmarks as the “standard candles” of our field. While there exists good critique regarding benchmarks that focus solely on task completion skills rather than intelligence, we nevertheless need to develop good benchmarks as a starting point for measuring our progress. What makes RB2 so unique and insightful is that it acts as a benchmarking standard that robotics researchers can use to replicate baseline results and compare against their own models. This has historically been an immense challenge in robotics because of variations in hardware, baseline implementations, objects used for experimentation, and lightning conditions, to name a few factors.
The immense variety in experimental setups makes reproducing robot experiments in the exact same environment near impossible. RB2 controls for some of these factors by providing implementations for baselines, and establishing guidelines for which kind of objects are to be used in the train and test datasets. In contrast, some design choices, such as lighting, choice of robots and grippers, are intentionally left open to allow flexibility. This results in each lab developing ‘local rankings’ between models specific to their setup, which are still unconstrained enough to be aggregated along with other labs’ results to contribute to a set of global rankings. In this way, researchers can reliably compare their models to baselines and also allow other researchers to easily replicate their results.
Editor Comments
Tanmay: I’m both excited about the paper, but hesitant to hail it as robotics’ ImageNet yet. Excited because it opens up the possibility of comparing learning algorithms in robotics that will undoubtedly help the research community. However, the underlying assumption in generating a global leaderboard is that an algorithm “better” than the baseline will perform the best on most robots, in most conditions, etc. To develop reliable global rankings in this way, we will need contributions from many labs towards this benchmark to allow for trends to reliably emerge from the rankings. I’m interested in seeing if the research community widely adopts the benchmark, or instead develops new ones building on this proposed methodology.
Daniel: I’m always excited to see movement towards more reproducibility and consistency across experiments in ML. I think standardized benchmarking and precise measurements are some of the key ingredients for making progress in any ML domain, and the field has unfortunately fallen short of measuring success well (see this critical analysis of progress in neural recommendation systems). For reasons highlighted in the article, robotics presents a number of unique challenges to consistent and precise measurement/comparison. I don’t know that RB2 will revolutionize this completely, but I’m excited to see a promising effort.
Andrey: As a robotics researcher, the almost total lack of non-simulated benchmarks for robotics has often mystified me. Of course, it’s easy to understand why this is difficult to achieve (different kinds of hardware, experimental conditions, etc.). So, it’s really exciting to see this research has found a smart way to make benchmarking viable. While it may not be as celebrated as hyped up work from the likes of OpenAI or DeepMind, this could definitely be highly impactful and important research.
New from the Gradient
Nick Walton on AI Dungeon and the Future of AI in Games
Other Things That Caught Our Eyes
News
Doctors often turn to Google Translate to talk to patients. They want a better option “The patient had just undergone a cesarean section, and now was struggling to put words to her pain in her native Taiwanese.”
South Korea candidates woo young voters with ‘deepfakes,’ hair insurance “South Korean presidential candidate Yoon Suk-yeol got a boost on Thursday when a rival dropped out, but if the conservative former prosecutor wins next week, it may also be thanks to ‘deepfake’ avatars and viral short videos.”
Ukraine has started using Clearview AI’s facial recognition during war “Ukraine's defense ministry on Saturday began using Clearview AI’s facial recognition technology, the company's chief executive told Reuters, after the U.S. startup offered to uncover Russian assailants, combat misinformation and identify the dead.”
Nvidia unveils new technology to speed up AI, launches new supercomputer “Nvidia announced new chips and technologies that it said will boost the computing speed of increasingly complicated artificial intelligence algorithms, stepping up competition against rival chipmakers vying for lucrative data center business.”
Mercedes Drive Pilot Beats Tesla Autopilot By Taking Legal Responsibility “Mercedes will accept full legal responsibility for the vehicle whenever Drive Pilot is active. The automaker hopes to offer the system in the U.S. by the end of 2022.”
Software vendors are pushing "explainable A.I." that often isn't “In some cases, explanations are being used to engender trust in A.I. when it isn't warranted, researchers say.”
Invading Ukraine has upended Russia's A.I. ambitions—and not even China may be able to help “Russia President Vladimir Putin’s invasion of Ukraine will likely spell disaster for the country’s ambitions to be a leader in artificial intelligence.”
Papers
CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning Gameplay videos contain rich information about how players interact with the game and how the game responds. Sharing gameplay videos on social media platforms, such as Reddit, has become a common practice for many players. Often, players will share gameplay videos that showcase video game bugs. Such gameplay videos are software artifacts that can be utilized for game testing, as they provide insight for bug analysis. Although large repositories of gameplay videos exist, parsing and mining them in an effective and structured fashion has still remained a big challenge. In this paper, we propose a search method that accepts any English text query as input to retrieve relevant videos from large repositories of gameplay videos…
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation End-to-end speech-to-speech translation (S2ST) without relying on intermediate text representations is a rapidly emerging frontier of research. Recent works have demonstrated that the performance of such direct S2ST systems is approaching that of conventional cascade S2ST when trained on comparable datasets. However, in practice, the performance of direct S2ST is bounded by the availability of paired S2ST training data. In this work, we explore multiple approaches for leveraging much more widely available unsupervised and weakly-supervised speech and text data to improve the performance of direct S2ST based on Translatotron 2…
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors Recent text-to-image generation methods provide a simple yet exciting conversion capability between text and image domains… We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene, (ii) introducing elements that substantially improve the tokenization process by employing domain-specific knowledge over key image regions (faces and salient objects), and (iii) adapting classifier-free guidance for the transformer use case. Our model achieves state-of-the-art FID and human evaluation results… Through scene controllability, we introduce several new capabilities: (i) Scene editing, (ii) text editing with anchor scenes, (iii) overcoming out-of-distribution text prompts, and (iv) story illustration generation, as demonstrated in the story we wrote.
Closing Thoughts
Have something to say about this edition’s topics? Shoot us an email at gradientpub@gmail.com and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! If you enjoyed this piece, consider donating to The Gradient via a Substack subscription, which helps keep this grad-student / volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!