ZeroGen^+: Self-Guided High-Quality Data Generation in Efficient Zero-Shot Learning

05/25/2022
by   Jiahui Gao, et al.
0

Nowadays, owing to the superior capacity of the large pre-trained language models (PLM), the PLM-based zero-shot learning has shown promising performances on various natural language processing tasks. There are emerging interests in further exploring the zero-shot learning potential of PLMs. Among them, ZeroGen attempts to purely use PLM to generate data and train a tiny model without relying on any task-specific annotation. Despite its remarkable results, we observe that the synthesized data from PLM contains a significant portion of samples with low quality, overfitting on such data greatly hampers the performance of the trained model and makes it unreliable for deployment.Since no gold data is accessible in zero-shot scenario, it is hard to perform model/data selection to prevent overfitting to the low-quality data. To address this problem, we propose a noise-robust bi-level re-weighting framework which is able to learn the per-sample weights measuring the data quality without requiring any gold data. With the learnt weights, clean subsets of different sizes can then be sampled to train the task model. We theoretically and empirically verify our method is able to construct synthetic dataset with good quality. Our method yeilds a 7.1 accuracy across five different established text classification tasks.

READ FULL TEXT
research
02/09/2022

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

Pretrained language models (PLMs) have demonstrated remarkable performan...
research
10/22/2022

ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback

Recently, dataset-generation-based zero-shot learning has shown promisin...
research
02/16/2022

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

There is a growing interest in dataset generation recently due to the su...
research
07/13/2023

AutoHint: Automatic Prompt Optimization with Hint Generation

This paper presents AutoHint, a novel framework for automatic prompt eng...
research
07/15/2019

Mitigating the Hubness Problem for Zero-Shot Learning of 3D Objects

The development of advanced 3D sensors has enabled many objects to be ca...
research
09/29/2022

Prompt-guided Scene Generation for 3D Zero-Shot Learning

Zero-shot learning on 3D point cloud data is a related underexplored pro...
research
01/04/2022

ZeroBERTo – Leveraging Zero-Shot Text Classification by Topic Modeling

Traditional text classification approaches often require a good amount o...

Please sign up or login with your details

Forgot password? Click here to reset