Synthetically generated text for supervised text analysis

03/28/2023
by   Andrew Halterman, et al.
0

Supervised text models are a valuable tool for political scientists but present several obstacles to their use, including the expense of hand-labeling documents, the difficulty of retrieving rare relevant documents for annotation, and copyright and privacy concerns involved in sharing annotated documents. This article proposes a partial solution to these three issues, in the form of controlled generation of synthetic text with large language models. I provide a conceptual overview of text generation, guidance on when researchers should prefer different techniques for generating synthetic text, a discussion of ethics, and a simple technique for improving the quality of synthetic text. I demonstrate the usefulness of synthetic text with three applications: generating synthetic tweets describing the fighting in Ukraine, synthetic news articles describing specified political events for training an event detection system, and a multilingual corpus of populist manifesto statements for training a sentence-level populism classifier.

READ FULL TEXT
research
06/14/2021

Automatic Document Sketching: Generating Drafts from Analogous Texts

The advent of large pre-trained language models has made it possible to ...
research
08/01/2020

Cross-context News Corpus for Protest Events related Knowledge Base Construction

We describe a gold standard corpus of protest events that comprise of va...
research
01/15/2016

Detecting and Extracting Events from Text Documents

Events of various kinds are mentioned and discussed in text documents, w...
research
09/06/2023

Synthetic Text Generation using Hypergraph Representations

Generating synthetic variants of a document is often posed as text-to-te...
research
10/13/2022

The COVID That Wasn't: Counterfactual Journalism Using GPT

In this paper, we explore the use of large language models to assess hum...
research
09/10/2021

Controlled Neural Sentence-Level Reframing of News Articles

Framing a news article means to portray the reported event from a specif...
research
06/28/2022

Placing (Historical) Facts on a Timeline: A Classification cum Coref Resolution Approach

A timeline provides one of the most effective ways to visualize the impo...

Please sign up or login with your details

Forgot password? Click here to reset