Update #80: Kamala Harris's AI Policy and End-To-End Causal Effect Estimation from Unstructured Natural Language Data
We look at Harris's stances on AI regulation; researchers introduce the NATURAL framework for causal effect estimation.
Welcome to the 80th update from the Gradient! If you’re new and like what you see, subscribe and follow us on Twitter. Our newsletters run long, so you’ll need to view this post on Substack to see everything!
As always, if you want to write with us, send a pitch using this form.
News Highlight: Kamala Harris and the Future of AI Policy
Summary
Kamala Harris has played a key role in the Biden administration’s efforts around artificial intelligence, taking the lead in negotiating safety standards with major tech firms and pushing for federal regulations to mitigate AI's potential harms. As the likely Democratic presidential nominee, her commitment to AI regulation is a crucial part of her policy agenda. So what does Harris mean for the future of AI regulation and governance?
Overview
Recently, President Joe Biden announced that he will not seek reelection, endorsing Vice President Kamala Harris as the Democratic nominee. While Harris has expressed her determination to secure the nomination, it remains uncertain if she will face competition from other Democrats.
Over the past three years, Vice President Kamala Harris has taken a leading role inside the White House on artificial intelligence (source). Specifically, Harris organized key meetings with leaders from top tech firms such as OpenAI, Anthropic and Google, securing agreements on voluntary safety measures (source). She has spoken against the false dichotomy of choosing between public protection and technological advancement. Harris has argued that without strong government oversight, the tech industry might prioritize profits over public welfare, making it important for the need for voluntary commitments from companies as a step towards a safer AI future (source). While she advocated for Congress to pass regulations to safeguard against AI-related job losses and other potential damages, significant legislative progress and corporate compliance have been limited (source).
At the UK’s AI Safety Summit, held in November last year, Kamala Harris ‘really brought it all back down to earth’ to quote Verity Harding, former global head of public policy at Google DeepMind (source). Discussions at the summit were focused on the potential risk of a hypothetical ‘runaway AI’ causing global harm. Harris steered the conversation towards the need of the hour – AI policy to protect the common public, pointing out the damage biases and errors in AI tools could have on marginalized communities (source). Despite the influential debates about AI regulation, Harris’ entrance into the presidential election has received a mixed reaction regarding AI regulation. Activists focused on AI policy might feel energized by her candidacy, seeing it as an opportunity to push for more stringent regulations. Meanwhile, some people within the tech industry and others might interpret her candidacy as a sign that the current, relatively lenient regulatory environment for AI companies in the U.S. could continue (source).
Additionally, Harris has a long-standing connection with the tech industry from her days as San Francisco’s district attorney and California’s attorney general, with backing from influential Silicon Valley figures. Her early supporters include the likes of famous VCs John Doerr and Ron Conway. As a presidential candidate, she was promptly backed by LinkedIn co-founder Reid Hoffman (source). During her 2010 campaign for California Attorney General, Kamala Harris participated in a Q&A session at Google's headquarters in Mountain View and emphasized the critical role of the tech industry's expertise in enhancing government communication and modernizing its systems (source). While she has faced criticism for not being tougher on tech giants during her tenure as attorney general, she has consistently called for more regulation in the sector to protect consumer interests.
On other tech-related issues, Harris has addressed concerns about TikTok's ownership related to national security, indicating no intention to ban the app but highlighting the need for action regarding its management. Her stance on cryptocurrency has been less pronounced, but she is expected to support the Biden administration's regulatory approach in this area as well.
Our Take
As far as AI regulation is concerned, Kamala Harris has managed to set herself apart from the likes of Donald Trump, who has said little about artificial intelligence. Hailing from the Bay Area, Silicon Valley is not an unknown territory for her, and she wisely used this fact to impress AI policy activists. So far, she has done a good job of balancing between being too vocal and being too defensive. However, she has not taken any active action on AI either. Her only major prior involvement has been as California’s attorney general, where she initiated a case against a major porn site operator and negotiated an agreement with leading tech companies to enhance user privacy protections. It remains to be seen whether the support and appreciation coming from AI policy activists is an outcome of just a new player coming into the mix or Harris’ skill and talent. The true measure of the upcoming policies, as always, will be their impact on working-class families.
-- Sharut
Research Highlight: End-To-End Causal Effect Estimation from Unstructured Natural Language Data
Summary
Estimating causal effects is one of the most common and difficult tasks for scientists and analysts. The gold standard for causal effect estimation relies on running randomized control experiments which are extremely time and cost prohibitive. Additionally, there are other types of causal effects that one may be interested in measuring but would be unable to experimentally test such as whether or not cigarettes cause lung cancer (imagine assigning treatment and control groups here). Recently, researchers from the Vector Institute at the University of Toronto and Meta published a paper introducing the NATURAL framework for causal effect estimation. NATURAL, ties together many large language models (LLMs) to mine large volumes of unstructured text (reddit posts!) to model conditional distributions which are then used as inputs for classical casual effect estimators. They demonstrated the methodology on 2 synthetic and 4 real observational datasets paired with ground truth from real phase 3/4 clinical trials. They found that NATURAL was able to parse through hundreds of thousands of reddit posts discussing various treatments to estimate the experimentally observed causal effect within 3% of the ground truth.
Overview
The researchers begin by posing a “simple” question: How can we use large language models to automate treatment effect estimation using freely available text data? The researchers decided to explore a pipeline of LLMs to ultimately infer conditional distributions which could then be used by classical estimators for average treatment effects. This pipeline can best be understood via walking through an example research question.
The example the researchers walked through was on comparing the treatment effects of two different weight loss drugs; Semaglutide (also sold under Ozempic or Wegovy) vs. Tirzepatide (sold under Zepbound). Their pipeline consisted of the following steps:
Data Gathering
Identified 9 subreddits where users often shared their experiences taking weight loss drugs and the effects associated with them
Relevance Filtering
Heuristic Filtering - Removed posts that were from “bot” accounts, posts that were deleted, posts that did not explicitly match relevant keywords, and posts that were too short
Relevancy Filtering - Uses a LLM (which has been prompted with in context examples) to further filter out posts that do not contain information relevant to the study
Treatment Outcome Filtering – Uses an LLM to identify posts that contain all the variables (starting weight, end weight, duration, dosage, etc) for measuring treatment effects. Similarly, LLM extracts the data and exports as a JSON to be used downstream
Covariate Extracting & Inclusion Criteria
Uses an LLM to extract relevant covariates from the post. Some covariates for this trial included age, gender, starting weight, dosage, and treatment duration
Filters samples out with partially or totally incomplete covariate data
Filters samples to include samples that match the expected covariates (an example would be dosages are within expected treatment ranges)
Infer Conditional Distributions
Iterates over all potential treatment outcomes and permutations of covariates to prompt an LLM to estimate the treatment effect for the particular permutation, given all the filtered reports
LLAMA2-70B was chosen here for its ability to directly return log-probability
Re-normalization of probabilities over all outcomes to convert to a probability distribution
Our Take
Without rehashing any of the details from the limitations and impact sections of the paper, there are a few other limitations left out from this section but could be inferred from others. For the six trials examined (4 real, 2 synthetic), NATURAL was able to parse relevant reddit posts to accurately estimate causal effects observed in randomized clinical trials. While that does seem super promising and exciting, according to the authors it took quite a bit of fine tuning before they could match the ground truth. I worry that in real world examples without ground truth, there is little one could do to easily validate the quality of the numerous prompts needed, as well as the context examples and how those choices impact results. We can see this in practice from looking at the below figure from the ablation section.
The authors show the root mean square error (RMSE) between the estimated treatment effect and the observed treatment effect converges to a minimum (for the 70B model) after ~ 1000 reports (2^10). For data practitioners who often lack large labeled datasets, this seems incredibly promising for the low volume. However, we argue this is a little misleading since the 1000+ reports needed were found after filtering from an initial 577K posts. Since the error declines as the number of reports increases and the # of reports generated are conditioned on the quality of the filtering and covariate extraction we can get a good data driven view on the impact that quality (or poor) prompt tuning can have on our results.
Justin
New from the Gradient
Manuel and Lenore Blum: The Conscious Turing Machine
Kevin Dorst: Against Irrationalist Narratives
Other Things That Caught Our Eyes
News
An Algorithm Told Police She Was Safe. Then Her Husband Killed Her.
Spain has relied on an algorithm called VioGén to assess the risk of domestic violence victims being abused again and determine the level of protection they need. While the system has helped reduce repeat attacks in domestic violence cases, it has also resulted in victims being attacked again, sometimes with fatal consequences. Spain currently has 92,000 active cases of gender violence victims evaluated by VioGén, with most classified as facing little risk of being hurt again. However, a significant number of women who were assessed as low or negligible risk have reported being harmed again. At least 247 women have been killed by their current or former partners after being assessed by VioGén. The algorithm's flaws have raised concerns about the reliance on algorithms in making life or death decisions.
‘Google says I’m a dead physicist’: is the world’s biggest search engine broken?
Google's search engine has come under scrutiny recently, with users claiming that it is not working as well as it should. The article explores the history of Google and its rise to dominance in the search market, as well as its influence over politics, social attitudes, and businesses. Critics argue that Google search has deteriorated in quality, citing issues such as spam, SEO practices, and the clutter of information boxes within search results. However, others still find Google search to be effective. The article raises questions about Google's trustworthiness and its ability to prioritize user interests.
Trump allies draft AI order to launch ‘Manhattan Projects’ for defense
Former President Donald Trump's allies are working on a comprehensive AI executive order that would establish "Manhattan Projects" to develop military technology and review regulations. The order aims to create "industry-led" agencies to evaluate AI models and protect systems from foreign adversaries. This approach differs from the Biden administration's executive order, which focuses on safety testing for AI systems. The GOP has adopted a platform that includes repealing the Biden AI executive order, claiming it hinders innovation. The framework provides insight into potential Republican policies to replace the Biden order. The greater military investment in AI could benefit tech companies like Anduril, Palantir, and Scale, which already have contracts with the Pentagon. The conservative Heritage Foundation is also drafting AI policies as part of Project 2025. Tech executives and investors, including Elon Musk and Bill Ackman, have endorsed Trump, indicating a potential second Trump administration would have a friendlier relationship with the tech industry.
FTC is investigating how companies are using AI to base pricing on consumer behavior
The Federal Trade Commission (FTC) is investigating how companies are using AI to base pricing on consumer behavior. The agency has ordered eight companies, including Mastercard, JPMorgan Chase, and Accenture, to provide information about their AI-powered "surveillance service pricing" and its impact on privacy, competition, and consumer protection. This practice allows companies to charge different prices to different customers based on factors such as location and personal data. The FTC is concerned that this use of AI and personal data could put people's privacy at risk and wants to shed light on this "shadowy ecosystem of pricing middlemen." The investigation aims to understand the types of surveillance pricing services offered by these companies and how they are impacting consumer pricing.
Google Is the Only Search Engine That Works on Reddit Now Thanks to AI Deal
Google has become the exclusive search engine for Reddit, as other search engines like Bing and DuckDuckGo are no longer able to crawl Reddit and provide up-to-date results. This is due to Reddit's decision to lock down access to its site and prevent scraping for AI training data. Google's near monopoly on search is hindering competition and raises concerns about the quality of search results. This exclusivity is a result of a multi-million dollar deal that allows Google to scrape Reddit for data to train its AI products.
OpenAI training and inference costs could reach $7bn for 2024, AI startup set to lose $5bn - report
OpenAI is projected to spend nearly $4 billion this year on training and inference costs, with a potential shortfall of $5 billion. The company currently uses Microsoft's servers to run inference workloads for ChatGPT, with around 290,000 servers dedicated to this task. Training ChatGPT and new models could cost up to $3 billion this year. OpenAI benefits from discounted rates from Microsoft Azure, paying about $1.30 per A100 server per hour. The company employs around 1,500 people, which could cost $1.5 billion as it continues to grow. While OpenAI generates about $2 billion annually from ChatGPT, it may need to raise additional funds within the next year to cover its losses.
Apple takes on Meta with new open-source AI model — here's why it matters
Apple has released a new open-source AI model with 7 billion parameters, signaling its commitment to the wider AI ecosystem. The model, part of Apple's DCML project, outperforms similar-sized models from Meta and Google. It is fully open source, with all weights, training data, and processes publicly available. Despite its small size and context window, the model's open-source nature makes it a significant AI release of the year. Researchers and companies can use the model to create their own small AIs without per-token costs. This aligns with the goal of creating intelligence that is affordable and accessible.
China Is Closing the A.I. Gap With the United States
At the World Artificial Intelligence Conference in Shanghai, Chinese start-up founder Qu Dongqi showcased a video created using AI technology from Chinese internet company Kuaishou. The video, which brought an old photograph to life, demonstrated the advancements China has made in the field of artificial intelligence. This technology is similar to the video generator Sora, developed by OpenAI, but the Chinese version is already available to the general public. This highlights China's progress in closing the AI gap with the United States.
Video game performers will go on strike over artificial intelligence concerns
Video game performers, including voice actors and motion capture performers, are going on strike due to concerns over AI protections. The Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA) has been negotiating with major game studios for nearly two years over a new interactive media agreement. While progress has been made on wages and job safety, the two sides remain divided on the regulation of generative AI. The union is concerned that without proper safeguards, game companies could use AI to replicate actors' voices or create digital replicas of their likeness without consent or fair compensation. The strike is a last resort after exhausting other possibilities. The global video game industry generates over $100 billion in profit annually, and the performers are demanding that their work be protected. The strike will not include games covered by separate contracts that have AI protections.
AI Video Generator Runway Trained on Thousands of YouTube Videos Without Permission
A recent investigation by 404 Media has revealed that the AI video generation tool developed by Runway, a multi-billion dollar company, was trained using scraped videos from YouTube creators, brands, and pirated films. The tool, initially known as Jupiter and later released as Gen-3, received widespread praise upon its launch in June. Runway, which raised $141 million in funding last year, has not provided specific details about the training data for Gen-3. This revelation raises concerns about the ethical use of AI technology and the potential infringement of copyright laws.
Anthropic’s crawler is ignoring websites’ anti-AI scraping policies
Anthropic's web crawler, ClaudeBot, has been scraping websites for training data for AI models without regard for the websites' anti-AI scraping policies. One of the affected websites, iFixit, noticed that ClaudeBot had accessed their content almost a million times in 24 hours, violating their Terms of Use. iFixit's CEO, Kyle Wiens, expressed concern about the unauthorized use of their content and the strain it put on their resources. Although iFixit added a crawl-delay extension to their robots.txt file to block the crawler, Anthropic claimed that their crawler can only be blocked through robots.txt. Other websites, such as Read the Docs and Freelancer.com, also reported being aggressively scraped by ClaudeBot. This is not the first time ClaudeBot's scraping activities have caused issues, as previous incidents have been reported on Reddit and the Linux Mint web forum.
Elon Musk’s X under pressure from regulators over data harvesting for Grok AI
X (fka Twitter) is facing pressure from data regulators due to a default setting that allows users' posts to be used for training an AI chatbot called Grok. The UK and Irish data watchdogs have contacted X regarding the apparent attempt to gain user consent for data harvesting without their knowledge. Under UK GDPR, companies are not allowed to use default consent methods. The default setting on X, which comes with a pre-ticked box, allows users' posts and interactions with Grok to be used for training. Data regulators have expressed concern and emphasized the need for transparency and user notification.
OpenAI’s Search Tool Has Already Made a Mistake
OpenAI recently announced the launch of SearchGPT, a prototype tool that uses AI to answer questions by searching the internet. However, even in the demo, SearchGPT made a mistake. When a user searched for music festivals in Boone, North Carolina in August, the top suggestion provided by SearchGPT was a fair that actually ends in July. This highlights a common issue with AI search tools, as they often exhibit errors and inaccuracies. While AI searchbots have the potential to revolutionize internet search by providing personalized answers, they still have a long way to go in terms of accuracy and reliability.
AMD claims its top-tier Ryzen AI chip is faster than Apple’s M3 Pro
AMD recently held an event to showcase its new Strix Point Ryzen AI chips, built on the Zen 5 architecture. AMD claims that these chips can outperform Apple's M3 and M3 Pro chips, as well as beat Qualcomm's and Intel's integrated graphics. The new Ryzen AI chips offer architectural improvements, with a 16% increase in instructions per clock cycle and a 19-32% boost in graphics performance per watt. However, AMD has yet to provide concrete evidence to support these claims. The company also did not provide specific details about battery life improvements. The first laptops with the new chips will be available on July 28th.
Papers
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
AI models collapse when trained on recursively generated data
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
AI achieves silver-medal standard in solving IMO problems
And some good context from Timothy Gowers
Closing Thoughts
Have something to say about this edition’s topics? Shoot us an email at editor@thegradient.pub and we will consider sharing the most interesting thoughts from readers to share in the next newsletter! For feedback, you can also reach Daniel directly at dbashir@hmc.edu or on Twitter. If you enjoyed this newsletter, consider donating to The Gradient via a Substack subscription, which helps keep this volunteer-run project afloat. Thanks for reading the latest Update from the Gradient!
Harris simply wants to ensure that generative AI spits out partisan rhetoric by censoring other views. What did she offer these firms in return?