Few-Shot Upsampling for Protest Size Detection

by   Andrew Halterman, et al.

We propose a new task and dataset for a common problem in social science research: "upsampling" coarse document labels to fine-grained labels or spans. We pose the problem in a question answering format, with the answers providing the fine-grained labels. We provide a benchmark dataset and baselines on a socially impactful task: identifying the exact crowd size at protests and demonstrations in the United States given only order-of-magnitude information about protest attendance, a very small sample of fine-grained examples, and English-language news text. We evaluate several baseline models, including zero-shot results from rule-based and question-answering models, few-shot models fine-tuned on a small set of documents, and weakly supervised models using a larger set of coarsely-labeled documents. We find that our rule-based model initially outperforms a zero-shot pre-trained transformer language model but that further fine-tuning on a very small subset of 25 examples substantially improves out-of-sample performance. We also demonstrate a method for fine-tuning the transformer span on only the coarse labels that performs similarly to our rule-based approach. This work will contribute to social scientists' ability to generate data to understand the causes and successes of collective action.



There are no comments yet.


page 4


Unsupervised Neural Machine Translation with Generative Language Models Only

We show how to derive state-of-the-art unsupervised neural machine trans...

Prompt-Learning for Fine-Grained Entity Typing

As an effective approach to tune pre-trained language models (PLMs) for ...

Robust fine-tuning of zero-shot models

Large pre-trained models such as CLIP offer consistent accuracy across a...

AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models

Despite their success in a variety of NLP tasks, pre-trained language mo...

How Optimal is Greedy Decoding for Extractive Question Answering?

Fine-tuned language models use greedy decoding to answer reading compreh...

Small but Mighty: New Benchmarks for Split and Rephrase

Split and Rephrase is a text simplification task of rewriting a complex ...

Efficient Algorithms for Learning from Coarse Labels

For many learning problems one may not have access to fine grained label...

Code Repositories



view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.