ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback

10/22/2022
by   Jiacheng Ye, et al.
0

Recently, dataset-generation-based zero-shot learning has shown promising results by training a task-specific model with a dataset synthesized from large pre-trained language models (PLMs). The final task-specific model often achieves compatible or even better performance than PLMs under the zero-shot setting, with orders of magnitude fewer parameters. However, synthetic datasets have their drawbacks. They have long been suffering from low-quality issues (e.g., low informativeness and redundancy). This explains why the massive synthetic data does not lead to better performance – a scenario we would expect in the human-labeled data. To improve the quality of dataset synthesis, we propose a progressive zero-shot dataset generation framework, ProGen, which leverages the feedback from the task-specific model to guide the generation of new training data via in-context examples. Extensive experiments on five text classification datasets demonstrate the effectiveness of the proposed approach. We also show ProGen achieves on-par or superior performance with only 1% synthetic dataset size compared to baseline methods without in-context feedback.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2022

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

There is a growing interest in dataset generation recently due to the su...
research
09/19/2021

Towards Zero-Label Language Learning

This paper explores zero-label learning in Natural Language Processing (...
research
05/25/2022

ZeroGen^+: Self-Guided High-Quality Data Generation in Efficient Zero-Shot Learning

Nowadays, owing to the superior capacity of the large pre-trained langua...
research
01/02/2021

Zero-shot Learning by Generating Task-specific Adapters

Pre-trained text-to-text transformers achieve impressive performance acr...
research
01/04/2022

ZeroBERTo – Leveraging Zero-Shot Text Classification by Topic Modeling

Traditional text classification approaches often require a good amount o...
research
02/27/2022

Variational Autoencoder with Disentanglement Priors for Low-Resource Task-Specific Natural Language Generation

In this paper, we propose a variational autoencoder with disentanglement...
research
07/15/2023

Zero-shot NLG evaluation through Pairware Comparisons with LLMs

Evaluating Natural Language Generation (NLG) outputs is crucial but labo...

Please sign up or login with your details

Forgot password? Click here to reset