DeepAI AI Chat
Log In Sign Up

Synthetically generated text for supervised text analysis

by   Andrew Halterman, et al.

Supervised text models are a valuable tool for political scientists but present several obstacles to their use, including the expense of hand-labeling documents, the difficulty of retrieving rare relevant documents for annotation, and copyright and privacy concerns involved in sharing annotated documents. This article proposes a partial solution to these three issues, in the form of controlled generation of synthetic text with large language models. I provide a conceptual overview of text generation, guidance on when researchers should prefer different techniques for generating synthetic text, a discussion of ethics, and a simple technique for improving the quality of synthetic text. I demonstrate the usefulness of synthetic text with three applications: generating synthetic tweets describing the fighting in Ukraine, synthetic news articles describing specified political events for training an event detection system, and a multilingual corpus of populist manifesto statements for training a sentence-level populism classifier.


Automatic Document Sketching: Generating Drafts from Analogous Texts

The advent of large pre-trained language models has made it possible to ...

Cross-context News Corpus for Protest Events related Knowledge Base Construction

We describe a gold standard corpus of protest events that comprise of va...

Detecting and Extracting Events from Text Documents

Events of various kinds are mentioned and discussed in text documents, w...

Synthetic Text Generation using Hypergraph Representations

Generating synthetic variants of a document is often posed as text-to-te...

The COVID That Wasn't: Counterfactual Journalism Using GPT

In this paper, we explore the use of large language models to assess hum...

Controlled Neural Sentence-Level Reframing of News Articles

Framing a news article means to portray the reported event from a specif...

Placing (Historical) Facts on a Timeline: A Classification cum Coref Resolution Approach

A timeline provides one of the most effective ways to visualize the impo...