Towards Coding Social Science Datasets with Language Models

Researchers often rely on humans to code (label, annotate, etc.) large sets of texts. This kind of human coding forms an important part of social science research, yet the coding process is both resource intensive and highly variable from application to application. In some cases, efforts to automate this process have achieved human-level accuracies, but to achieve this, these attempts frequently rely on thousands of hand-labeled training examples, which makes them inapplicable to small-scale research studies and costly for large ones. Recent advances in a specific kind of artificial intelligence tool - language models (LMs) - provide a solution to this problem. Work in computer science makes it clear that LMs are able to classify text, without the cost (in financial terms and human effort) of alternative methods. To demonstrate the possibilities of LMs in this area of political science, we use GPT-3, one of the most advanced LMs, as a synthetic coder and compare it to human coders. We find that GPT-3 can match the performance of typical human coders and offers benefits over other machine learning methods of coding text. We find this across a variety of domains using very different coding procedures. This provides exciting evidence that language models can serve as a critical advance in the coding of open-ended texts in a variety of applications.

READ FULL TEXT

page 7

page 10

research
02/04/2021

Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models

On October 14th, 2020, researchers from OpenAI, the Stanford Institute f...
research
09/14/2022

Out of One, Many: Using Language Models to Simulate Human Samples

We propose and explore the possibility that language models can be studi...
research
02/09/2023

Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models

Accurate and comprehensive material databases extracted from research pa...
research
07/24/2023

Performance of Large Language Models in a Computer Science Degree Program

Large language models such as ChatGPT-3.5 and GPT-4.0 are ubiquitous and...
research
01/11/2021

The Slodderwetenschap (Sloppy Science) of Stochastic Parrots – A Plea for Science to NOT take the Route Advocated by Gebru and Bender

This article is a position paper written in reaction to the now-infamous...
research
09/28/2022

Who is GPT-3? An Exploration of Personality, Values and Demographics

Language models such as GPT-3 have caused a furore in the research commu...
research
07/21/2020

Creation of Audiovisual Screenplays with the Collaboration of Artificial Intelligence

We will discuss the possibilities to develop a computer program with Ar...

Please sign up or login with your details

Forgot password? Click here to reset