New Technology, Old Problems: The Missing Voices in Natural Language Processing
Who and what is being represented in data and development of NLP models, and how unequal representation leads to unequal allocation of the benefits of NLP technology
New Technology, Old Problems: The Missing Voices in Natural Language Processing
Preview:
Artificial Intelligence has been experiencing a renaissance in the past decade, driven by technological advances and open sourced datasets. Much of this advancement has focused on areas like Computer Vision and Natural Language Processing (NLP). ImageNet made a corpus of 20,000 images with content labels publicly available in 2010. Google released the Trillion Word Corpus in 2006 along with the n-gram frequencies from a huge number of public webpages. The resulting evolution in NLP has led to massive improvements in the quality of machine translation, rapid expansion in uptake of digital assistants and statements like “AI is the new electricity” and “AI will replace doctors”.
…
There’s a number of possible explanations for the shortcomings of modern NLP. In this article, I will focus on issues in representation; who and what is being represented in data and development of NLP models, and how unequal representation leads to unequal allocation of the benefits of NLP technology.