Few-Shot Upsampling for Protest Size Detection

05/24/2021
by   Andrew Halterman, et al.
0

We propose a new task and dataset for a common problem in social science research: "upsampling" coarse document labels to fine-grained labels or spans. We pose the problem in a question answering format, with the answers providing the fine-grained labels. We provide a benchmark dataset and baselines on a socially impactful task: identifying the exact crowd size at protests and demonstrations in the United States given only order-of-magnitude information about protest attendance, a very small sample of fine-grained examples, and English-language news text. We evaluate several baseline models, including zero-shot results from rule-based and question-answering models, few-shot models fine-tuned on a small set of documents, and weakly supervised models using a larger set of coarsely-labeled documents. We find that our rule-based model initially outperforms a zero-shot pre-trained transformer language model but that further fine-tuning on a very small subset of 25 examples substantially improves out-of-sample performance. We also demonstrate a method for fine-tuning the transformer span on only the coarse labels that performs similarly to our rule-based approach. This work will contribute to social scientists' ability to generate data to understand the causes and successes of collective action.

READ FULL TEXT
research
06/01/2023

Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering

The pre-training-fine-tuning paradigm based on layout-aware multimodal p...
research
05/19/2023

Evaluation of medium-large Language Models at zero-shot closed book generative question answering

Large language models (LLMs) have garnered significant attention, but th...
research
10/31/2022

Towards Zero-Shot and Few-Shot Table Question Answering using GPT-3

We present very early results on using GPT-3 to perform question answeri...
research
08/21/2023

DocPrompt: Large-scale continue pretrain for zero-shot and few-shot document question answering

In this paper, we propose Docprompt for document question answering task...
research
09/09/2021

AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models

Despite their success in a variety of NLP tasks, pre-trained language mo...
research
06/08/2023

Can AI Moderate Online Communities?

The task of cultivating healthy communication in online communities beco...
research
08/12/2021

How Optimal is Greedy Decoding for Extractive Question Answering?

Fine-tuned language models use greedy decoding to answer reading compreh...

Please sign up or login with your details

Forgot password? Click here to reset