Update #39: Uber's Faulty Facial Recognition in India and The First Learning on Graphs Conference
Uber's facial recognition system is harming Indian Uber driver's livelihoods, and the a summary of first Learning on Graphs conference
Welcome to the 39th update from the Gradient! If you were referred by a friend, subscribe and follow us on Twitter!
Want to write with us? Send a pitch using this form :)
News Highlight: Uber’s Facial Recognition Technology Locks Out Indian Drivers
Summary
In many countries around the world, Uber uses facial recognition technology as a two factor authentication method to verify the identity of their drivers. However, the technology comes with its own limitations and biases, which compounded with the lack of a robust system to address any incorrect decisions has led to many drivers in India being locked out of their accounts. While drivers claim that factors such as a different haircut, beard style or lighting conditions can impact the algorithm’s accuracy, Uber claims that their model is capable of handling these variances and also includes a human in the loop to minimize wrongful account suspensions.
Background
Microsoft’s Face API allows companies to use facial recognition technology in their products without prior knowledge of machine learning or image-matching methods. Many companies, including Uber, use this service to verify the identity of their users at periodic intervals in order to boost reliability and trust on their platform. However, a recent investigation by Varsha Bansal, a Bangalore-based AI Fellow and Grantee at the Pulitzer Center, found that of 150 Indian Uber drivers, around half have been temporarily or permanently blocked out of their accounts because of the facial recognition algorithm. Uber’s current system requires the driver to upload a selfie at different times during the day, which is then matched to a known photograph of the driver in the company’s database. Inevitably, factors such as camera quality and lighting conditions can degrade the quality of the selfie, and thereby the results of the facial recognition algorithm.
Uber maintains that no driver can be locked out of their account on the basis of a missed match in the face recognition model. The company says that any images flagged by the algorithm are then shown to at least 2 humans to confirm the algorithm’s prediction. Regardless, Bansal’s survey found tens of drivers who claim to have been unjustly locked out of the system.
Adnan Taqi, an Uber driver in Mumbai, recounts an incident when Uber’s app prompted him to take a selfie at dusk and failed to verify him, locking him out of the app for 48 hours. For drivers like Taqi, who work 18-24 hours a day to make ends meet, such lockouts can have a significant impact on their livelihood. A few days later, Taqi was once again locked out of his account because of the ID verification, this time for a week. Additionally, a 2021 paper that analyzed the performance of four popular facial recognition tools on Indian faces found that Microsoft’s algorithm failed to even find a face in 3% of the images tested – the worst performance of all algorithms tested. The question therefore is twofold – whether Microsoft’s Face API fails in different lighting conditions, hair styles, skin color, or variations in image quality, and also if Uber’s human in the loop method subsequently fails in minimizing mismatches when they occur.
The issues don’t end with algorithmic performance, but continue downstream to Uber’s support system for their drivers. Many drivers interviewed in the survey shared that once they are locked out of the app, seeking help from Uber’s driver assistance program is extremely cumbersome. They report that they have to incessantly call the helpline before their accounts are unlocked, and are mostly told that the “server is down” earlier. These issues are markers of the operational issues that machine learning methods still face when deployed at scale, and in diverse markets.
Why does it matter?
India has 600,000 drivers working for ride-hailing apps like Uber or Ola (India’s Lyft), or services such as Swiggy, Zomato and Urban Company (think DoorDash, Postmates, etc.). For many, these jobs are the only sources of income for their family, and being unjustly locked out of their accounts for days at a stretch can have severe consequences on the family’s finances. Additionally, the absence of a decent support infrastructure for drivers also highlights gaps in how organizations treat their customers versus their gig workers. While many countries in Europe now have strong legal protections for gig workers, such restrictions are largely absent in India, shedding light on how large corporations take advantage of lagging public policy.
Editor Comments
Daniel: All I have to say is oof. It’s unfortunate to see a facial recognition system with known faults deployed in such a way so that those faults are impacting people’s livelihoods. What’s even worse, though, is that Uber’s support system isn’t offering drivers accessible recourse for the algorithm’s faults. The algorithms are going to be faulty, and with that knowledge they should be embedded in systems that work to mitigate those faults and offer recourse (here’s a great paper on algorithmic recourse, by the way–discussion with the author coming soon!).
Research Highlight: The First Learning on Graphs (LoG) Conference
Summary
Last weekend, the first Learning on Graphs (LoG) Conference was held virtually, with over 2000 registered attendees. A variety of papers on graphs and geometry in machine learning were presented, showing the latest developments in these rapidly growing subfields of AI.
Overview
As this was the first offering of a new conference, there were several differences from other machine learning conferences. The conference was virtual and free to attend, so that geographical and economic factors which limit participation in other conferences were not as stark for LoG. The review process was held on OpenReview, and 20 monetary awards of $1500 were given to the best reviewers in an effort to increase reviewer participation and improve review quality. The conference was live streamed on youtube, and all recordings are available on the youtube channel.
The 83 accepted papers at the conference covered diverse areas of graph learning; here, we summarize the 12 oral presentations:
Theory-advancing oral presentations covered theory of non-asymptotic oversmoothing in GNNs, provably expressive shortest-path based GNNs, and an analysis of the widely-used but poorly understood virtual nodes in GNNs
Scalable graph learning methods were developed for adversarially robust GNNs through constructing cleaner graphs with spectral embeddings, mini-batching graphs with many nodes by choosing node subsets that are influential to the output, and for GNNs on temporal networks that use dictionaries of neighborhood representations instead of fixed-length vectors.
Dataset and benchmark focused papers provided a new taxonomy of graph learning benchmarks based on sensitivity of downstream performance to perturbations in the input graph, and developed a new benchmark curation platform that maintains previous versions, full attributions, and other important metadata for different benchmarks.
Other oral presentations include A Generalist Neural Algorithmic Learner that can simulate diverse algorithms using a single GNN based backbone, a new framework for few-shot finetuning of pretrained node embeddings, the discovery of very sparse untrained subnetworks of GNNs that perform well on downstream tasks, and generalizations of previously proposed permutation invariant group-theoretic graph features.
LoG also featured keynotes from notable figures in the field. Soledad Villar spoke on improving / analyzing the theoretical power of graph neural networks. Taco Cohen presented connections on equivariance, causality, and category theory. Marinka Zitnik, Stephan Günnemann, and Djork-Arné Clevert spoke about applications of graph ML to precision medicine, molecular systems, and drug discovery, respectively. These topics give a glimpse into the directions that academia and industry find interesting in geometric machine learning.
Why does it matter?
The participation in this conference shows major interest in graphs and geometry in machine learning, both from industry and academia. Several new initiatives were tested out at LoG, which could be adopted to improve other AI / ML venues as well. For instance, 20 reviewers were given monetary reviewer awards of $1500 USD each. Several sources [1], [2], [3] stated that the review quality was noticeably high in LoG, and reviewer participation was high [4].
Author Q&A
We talked to Fabrizio Frasca, a PhD student at Imperial College London who won one of the reviewer awards for his service at LoG:
Q: What was notable about your reviewing experience at LoG?
A: Amongst others, two aspects really stood out, in my opinion: the low load of papers to review and their relevance to the reviewer’s expertise. This is something I could really perceive myself while conducting my reviews, but I could also read on social media this feeling was shared amongst colleagues. Being the papers highly targeted to my area of expertise I could really contribute to high quality reviews and keep engaging in active conversations with the other reviewers and area chairs. On the other hand, the low load of papers took away the “deadline stress” and allowed enough time to keep the review process thorough.
Q: What do you think other venues could do to improve their review processes? (Lessons from LoG or otherwise are welcome).
A: By revolving around a very specific, yet extremely popular research field, LoG has benefitted from having access to large pool of expert reviewers. At the same time, awards in the form of financial compensations have been allocated to best reviewers. Whether this form of monetary incentive has significantly and positively impacted the quality of the reviews is yet to be assessed, but it already hints at the interesting exploratory attitude of the LoG conference, whereby organisers question the expectations of typical peer-review systems, aiming at high-quality content. Other venues should really consider taking inspiration from these efforts and rethink aspects of their peer-review process. e.g. by reflecting on new approaches for more effective award schemes.
Q: Any other notable parts of the LoG conference?
A: I believe LoG has shone not only for its high-quality review process, but also for the captivating line-up of invited speakers, as well as the comprehensive and varied set of tutorials it has offered to the audience. Making it free to access and superbly structured, the organisers have managed to create a great learning environment, accessible to researchers at any level of maturity. Ultimately, they could both meet and make the point on the hottest and most compelling current research directions in the field of deep learning on graphs.
New from the Gradient
Melanie Mitchell: Abstraction and Analogy in AI
Marc Bellemare: Distributional Reinforcement Learning
Other Things That Caught Our Eyes
News
What Would Plato Say About ChatGPT? “Plato mourned the invention of the alphabet, worried that the use of text would threaten traditional memory-based arts of rhetoric.”
New machine learning model could encode words to kill cancer cells “US researchers, using new machine learning techniques have developed a virtual molecular library of “words” that encode commands to kill cancer cells.”
As Google weighs in on ChatGPT, You.com enters the AI chat “One of the biggest topics underlying the hype bonanza since OpenAI’s release of ChatGPT two weeks ago has been: What does this mean for Google search? ”
Papers
Daniel: I felt Editing Models with Task Arithmetic was really neat–this work proposes a new paradigm for steering the behavior of pre-trained models. You get a “task vector” by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning: you can think of adding (or subtracting) a task vector to a model as saying “improve my performance on task X” (without impacting performance on other tasks by very much), and you can compose multiple task vectors to intervene on multiple facets of a model. I’m generally pretty excited about work that is able to intervene on model characteristics in a principled way, e.g. Git Re-Basin, this work, and Concept Bottleneck Models. The idea of “editing” a model to achieve desired characteristics like improved performance on a task, forgetting a concept, or mitigating undesirable behavior is really powerful. I also love the simplicity of the idea of a task vector (and the code)!
Derek: The paper Transformers learn in-context by gradient descent caught my eye. They show on regression tasks that there exist weight instantiations of (simplified) Transformers that do in-context learning in a way that matches a gradient descent step on the in-context examples. Then they empirically show that trained (simplified) Transformers do have similar behavior to gradient descent on in-context examples in several scenarios. I have been quite interested in recent lines of work [1] [2] that show that Transformers can implement or even exceed certain learning algorithms with in-context learning. This work furthers the theoretical and empirical evidence that Transformers may actually implement these optimization algorithms to some extent.
Tanmay: I think it would be unfair to not discuss Google’s recent Robotics Transformer paper which scales up robot learning by training a large transformer on 130,000 demonstrations. The training data consists of teleoperated trajectories collected across 744 tasks on 13 robots. Note that all of these tasks generally fall into the “pick and place” regime, so no need to fret about robot overlords (just yet ;) ). The most impressive part about the paper is the model’s ability to learn from data collected on another robot. In this case, researchers used data collected on a Kuka robot from the QT-Opt paper, which trained the robot with a reinforcement learning strategy to grasp objects from a bin. The authors of the Robotics Transformer paper found that their model had a 17% boost in performance at the bin-picking task when trained on a mix of data from the current robot and the Kuka robot, as compared to training on data from the current robot alone. Other results include a 97% accuracy at performing tasks seen in the training set, and 76% accuracy at unseen tasks. The model is compared to the BC-Z, BC-Z XL and Gato models as baselines and outperforms all of them them at all tasks by a significant margin.
Closing Thoughts
Have something to say about this edition’s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at dbashir@hmc.edu or on Twitter. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this grad-student / volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!