Few-shot Mining of Naturally Occurring Inputs and Outputs

05/09/2022
by   Mandar Joshi, et al.
5

Creating labeled natural language training data is expensive and requires significant human effort. We mine input output examples from large corpora using a supervised mining function trained using a small seed set of only 100 examples. The mining consists of two stages – (1) a biencoder-based recall-oriented dense search which pairs inputs with potential outputs, and (2) a crossencoder-based filter which re-ranks the output of the biencoder stage for better precision. Unlike model-generated data augmentation, our method mines naturally occurring high-quality input output pairs to mimic the style of the seed set for multiple tasks. On SQuAD-style reading comprehension, augmenting the seed set with the mined data results in an improvement of 13 F1 over a BART-large baseline fine-tuned only on the seed set. Likewise, we see improvements of 1.46 ROUGE-L on Xsum abstractive summarization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2019

HoloDetect: Few-Shot Learning for Error Detection

We introduce a few-shot learning framework for error detection. We show ...
research
09/19/2023

QASnowball: An Iterative Bootstrapping Framework for High-Quality Question-Answering Data Generation

Recent years have witnessed the success of question answering (QA), espe...
research
02/07/2020

Snippext: Semi-supervised Opinion Mining with Augmented Data

Online services are interested in solutions to opinion mining, which is ...
research
06/14/2023

Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data

Simultaneous speech translation (SimulST) translates partial speech inpu...
research
11/14/2022

CST5: Data Augmentation for Code-Switched Semantic Parsing

Extending semantic parsers to code-switched input has been a challenging...
research
10/21/2019

Bayesian Optimization Allowing for Common Random Numbers

Bayesian optimization is a powerful tool for expensive stochastic black-...
research
08/20/2018

Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension

Cloze-style reading comprehension has been a popular task for measuring ...

Please sign up or login with your details

Forgot password? Click here to reset