Automated Annotation with Generative AI Requires Validation

05/31/2023
by   Nicholas Pangakis, et al.
0

Generative large language models (LLMs) can be a powerful tool for augmenting text annotation procedures, but their performance varies across annotation tasks due to prompt quality, text data idiosyncrasies, and conceptual difficulty. Because these challenges will persist even as LLM technology improves, we argue that any automated annotation process using an LLM must validate the LLM's performance against labels generated by humans. To this end, we outline a workflow to harness the annotation potential of LLMs in a principled, efficient way. Using GPT-4, we validate this approach by replicating 27 annotation tasks across 11 datasets from recent social science articles in high-impact journals. We find that LLM performance for text annotation is promising but highly contingent on both the dataset and the type of annotation task, which reinforces the necessity to validate on a task-by-task basis. We make available easy-to-use software designed to implement our workflow and streamline the deployment of LLMs for automated annotation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2023

LabelVizier: Interactive Validation and Relabeling for Technical Text Annotations

With the rapid accumulation of text data produced by data-driven techniq...
research
04/20/2023

Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks

The release of ChatGPT has uncovered a range of possibilities whereby la...
research
08/11/2017

Break it Down for Me: A Study in Automated Lyric Annotation

Comprehending lyrics, as found in songs and poems, can pose a challenge ...
research
05/05/2021

Iterative Human and Automated Identification of Wildlife Images

Camera trapping is increasingly used to monitor wildlife, but this techn...
research
05/17/2022

Global Contentious Politics Database (GLOCON) Annotation Manuals

The database creation utilized automated text processing tools that dete...
research
10/11/2012

Distributional Framework for Emergent Knowledge Acquisition and its Application to Automated Document Annotation

The paper introduces a framework for representation and acquisition of k...
research
06/25/2019

Model-based annotation of coreference

Humans do not make inferences over texts, but over models of what texts ...

Please sign up or login with your details

Forgot password? Click here to reset