Update #33: Meta Oversight Board Cases and Symmetries + Loss Basins in Neural Networks
In which we discuss two recent content moderation decisions from Meta's Oversight Board and new work on merging neural networks, including Git Re-Basin.
Welcome to the 33rd update from the Gradient! If you were referred by a friend, subscribe and follow us on Twitter!
News Highlight: Meta’s Oversight Board Overturns Content Moderation Decisions
Summary
Meta’s oversight board recently released its rulings on a new set of cases, highlighting gaps that exist in Facebook’s content removal policy. In both cases, the Board overturned Facebook’s decision to remove the post from their platform. While an automated content removal bot was found to be at fault for the first case, the second case involved human error and bias in evaluating a news article about the Taliban. The Board’s decisions in both cases upheld users’ freedom of speech on the platform and demonstrated how the Board can work with Meta to deliver a better experience for Facebook users.
Background
Meta’s oversight board was founded in 2018 to act as a “Supreme Court” for Meta’s content moderation decisions. The group shared its first set of rulings with the public in January 2021 and received a lot of attention when they upheld Facebook’s ban on former President Donald Trump. The board is independent from Meta and routinely selects cases for review that can have ‘precedent setting’ implications.
Recently, the oversight board released its decisions on cases from Colombia and India. The first involves a user who shared a cartoon image on Facebook depicting police brutality in Colombia. Facebook removed the post 16 months after it was posted when its automated content moderation algorithm matched the cartoon to an image in a Media Matching Service bank. As the algorithm continued to flag posts with this image, over 215 users appealed the removal of their posts, 98% of which were successful. However, as the cartoon remained in the Media Matching Service bank, the algorithm continued to remove users’ posts. The Oversight Board determined that the cartoon does not violate Facebook’s rules and posts containing the image should never have been removed from the platform.
The Oversight Board also considered a case where an Indian newspaper published an article about the Taliban’s decision to re-open schools for girls in Afghanistan. The news outlet shared its article on Facebook, but the post was taken down as it was found to violate Meta’s Dangerous Individuals and Organization policy by “praising” the Taliban. The organization had since been trying to appeal this decision, but their request was not reviewed due to an absence of Urdu-speaking moderators at Facebook. The Oversight Board also reversed this decision, stating that simply reporting on newsworthy events does not violate Facebook’s policies.
These rulings, in addition to the Oversight Board’s previous decisions, have demonstrated how an independent and external agency can make landmark decisions for social media platforms, especially when content removal impacts a large number of people.
Why does it matter?
The implications of these rulings are twofold. First, we should realize that content moderation is exceedingly hard for a platform as big as Facebook and Instagram. As millions of posts flood the platform and are evaluated by both human and algorithmic moderators, errors are bound to slip through the cracks. The Oversight Board has set a great example of how it can serve as a supervisor to re-evaluate decisions with large impacts on users. In this way, perhaps the Board shouldn’t be seen as working against Meta, but rather working with them to add an additional layer of accountability to eventually create a better platform for their users.
Secondly, the rulings throw some light on issues that currently exist in Meta’s content moderation systems. In the case from Colombia, it was noted that an image that shouldn’t have been a part of the image matching database not only made its way into the image bank, but also remained there, even as posts containing the image were allowed back onto Facebook. In the case from India, the lack of human moderators competent in a specific language created gaps in the appeals process that eventually forced the news organization to appeal to the Oversight Board. As the Board has now taken up two new cases, we have yet to see how the organization’s rulings and recommendations may impact Meta’s systems for content moderation and set an example for how this relationship can succeed in the long term.
Editor Comments
Daniel: I recall one of the early qualms about the Oversight Board was that it would offer Facebook/Meta a veneer of accountability, while not actually having much power. The Oversight Board’s discourse seems critical enough that it won’t be confined to just PR: this year the Board said Facebook should be far more transparent about its content moderation decisions. Meta also provided an additional $150 million in funding for the Board months ago. But as The Washington Post has noted, it’s unclear whether a board with no formal authority can force Facebook to follow its recommendations–furthermore, the Board is entirely dependent on Facebook for information, money, and the power to effect change.
Research Highlight: Symmetries and Loss Basins in Neural Networks
Summary
New work has shown that linearly interpolating between the weights of two trained neural networks often leads to a continuum of similarly well performing models, given that there is a suitable permutation that aligns the weights of the two networks. This has implications on the mysterious loss landscapes of neural networks and on merging separate models, such as those trained on disjoint datasets.
Overview
The loss landscape of a neural network describes how its loss changes as the parameters change. Neural networks’ loss landscapes are high-dimensional, non-convex, and generally elusive to understand, but they are crucial to understanding generalization, optimization, and overall performance of neural networks. Past work has shown that the optima within these loss landscapes that neural networks converge to are often connected by nonlinear paths in parameter space, meaning that between any two trained neural networks there is a continuous path of neural networks that achieve similarly low loss.
A recent paper by Entezari et al. [3] hypothesizes that these optima are in fact connected by linear paths, up to a suitable permutation of the parameters. Permuting the neurons of a neural network does not change the learned output, so such permutations of the parameters do not change the network, and instead reveal similarities between trained networks. Their work provides some supporting evidence for the conjecture, though their proposed simulated annealing algorithm cannot find suitable linear paths for more complicated networks like ResNets.
Recently, two papers and follow-up experiments shared on Twitter have explored other ways to find suitable permutations, and have discovered promising new results. In Git-ReBasin [1], UW researchers propose three methods to learn the permutations: matching activations of the network on the training set, directly matching weights with a greedy algorithm, and minimizing the interpolation loss directly using gradient based methods (modified to handle the discrete nature of permutations). In another paper [2], ETH Zurich and University of Bristol researchers just consider matching activations.
Loss barrier (difference between average of original model loss and loss of linearly interpolated models) for ResNets on CIFAR-10. From [1]
Git-ReBasin is significantly more successful at finding permutations that lead to low-loss linear paths between trained networks. They find that wider networks are more likely to be linearly mode connected, and they show that their permutation learning algorithms can find low-loss linear paths for 32x width ResNets trained on CIFAR-10, the first result of this type. They further demonstrate that the learned permutations can be used to merge models trained on disjoint subsets of data — linear interpolation between two ResNets trained on disjoint, class-imbalanced subsets of CIFAR-10 achieves lower test loss than each of the original two ResNets.
Linear interpolation with a suitable permutation on randomly initialized networks achieves strong test accuracy. From [2].
Paper [2] similarly finds some success in linear interpolation after permutation, but finds that this heavily depends on the learning rate of SGD as well as network width and depth. Notably, they find that a learned permutation can find strong networks in the linear segment between two randomly initialized networks. To demonstrate this, they randomly initialize a network A and a network B, which both achieve the accuracy of random guessing (about 10% accuracy on Fashion-MNIST). They then separately train each network, and then find a permutation that matches the activations. Finally, they interpolate between the weights of A and B at random initialization, and find a network that achieves much better than chance performance (about 70% accuracy).
Why does it matter?
These new works provide more evidence for some versions of the hypothesis of [3], and they propose new methods to make use of cases where the hypothesis holds. However, several negative results also arose in these investigations. Git-ReBasin provides a simple counterexample of a neural network for which no permutation provides a linear interpolation path of low cost. Thus, future work could try to understand why gradient-based algorithms often do not converge to such counterexamples. Further, ML researchers on Twitter posted negative results showing that the interpolation success depends on choice of optimizer, learning rate, and handling of batch norm.
Still, in the situations where such linear paths can be found, this behavior can be used for improved understanding and usage of neural networks. If many optima in the loss landscape are explained by symmetries of invariances of neural networks, then the loss landscapes may have nicer properties than expected, which may render them easier to analyze theoretically and empirically. Further, such results can lead to better model averaging, as is done for improving generalization in neural networks, and in settings like federated learning. Additionally, the results suggest that further studying the interplay between weight initialization and symmetries may lead to better models through averaging or faster convergence.
Sources
[1] Git-ReBasin: Merging Models modulo Permutation Symmetries
[2] Random initialisations performing above chance and how to find them
[3] The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
Editor Comments
Derek: This line of work has completely gripped my attention recently. There is such a nice mix of theory and empirical exploration that can be investigated in the future, and the possibility of improved model averaging has the potential to be extremely impactful for future AI workflows.
Daniel: As many people have already commented, if these results are replicable (work in progress), they might yield a number of important applications. The idea of “merging” models that perform well on disjoint datasets is quite the valuable advance.
New from the Gradient
Joel Lehman: Open-Endedness and Evolution through Large Models
Andrew Feldman: Cerebras and AI Hardware
Other Things That Caught Our Eyes
News
The world is moving closer to a new cold war fought with authoritarian tech “At the Shanghai Cooperation Organization summit, Iran, Turkey, and Myanmar promised tighter trade relationships with Russia and China.”
Tesla is being sued over Autopilot and Elon Musk’s Full Self-Driving predictions “A lawsuit filed in San Francisco by a Tesla owner claims the automaker and its CEO / Technoking Elon Musk are ‘deceptively and misleadingly’ marketing the Autopilot and “Full Self-Driving” advanced driver assistance features that are available as paid software add-ons”
This artist is dominating AI-generated art. And he’s not happy about it. “Those cool AI-generated images you’ve seen across the internet? There’s a good chance they are based on the works of Greg Rutkowski.”
Papers
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation Transformers have revolutionized vision and natural language processing with their ability to scale with large datasets. But in robotic manipulation, data is both limited and expensive… PerAct encodes language goals and RGB-D voxel observations with a Perceiver Transformer, and outputs discretized actions by "detecting the next best voxel action". Unlike frameworks that operate on 2D images, the voxelized observation and action space provides a strong structural prior for efficiently learning 6-DoF policies. With this formulation, we train a single multi-task Transformer for 18 RLBench tasks (with 249 variations) and 7 real-world tasks (with 18 variations) from just a few demonstrations per task. Our results show that PerAct significantly outperforms unstructured image-to-action agents and 3D ConvNet baselines for a wide range of tabletop tasks.
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective While reinforcement learning (RL) methods that learn an internal model of the environment have the potential to be more sample efficient than their model-free counterparts, learning to model raw observations from high dimensional sensors can be challenging… In this work, we propose a single objective which jointly optimizes a latent-space model and policy to achieve high returns while remaining self-consistent. This objective is a lower bound on expected returns… directly on the overall RL objective. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods. While such sample efficient methods typically are computationally demanding, our method attains the performance of SAC in about 50% less wall-clock time.
Power to the People? Opportunities and Challenges for Participatory AI Participatory approaches to artificial intelligence (AI) and machine learning (ML) are gaining momentum: the increased attention comes partly with the view that participation opens the gateway to an inclusive, equitable, robust, responsible and trustworthy AI… However, there currently exists a lack of clarity on what meaningful participation entails and what it is expected to do. In this paper we first review participatory approaches as situated in historical contexts as well as participatory methods and practices within the AI and ML pipeline. We then introduce three case studies in participatory AI. Participation holds the potential for beneficial, emancipatory and empowering technology design, development and deployment while also being at risk for concerns such as cooptation and conflation with other activities. We lay out these limitations and concerns and argue that as participatory AI/ML becomes in vogue, a contextual and nuanced understanding of the term as well as consideration of who the primary beneficiaries of participatory activities ought to be constitute crucial factors to realizing the benefits and opportunities that participation brings.
Closing Thoughts
Have something to say about this edition’s topics? Shoot us an email at gradientpub@gmail.com and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at dbashir@hmc.edu or on Twitter. If you enjoyed this piece, consider donating to The Gradient via a Substack subscription, which helps keep this grad-student / volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!