Self-augmented Data Selection for Few-shot Dialogue Generation

05/19/2022
by   Wanyu Du, et al.
0

The natural language generation (NLG) module in task-oriented dialogue systems translates structured meaning representations (MRs) into text responses, which has a great impact on users' experience as the human-machine interaction interface. However, in practice, developers often only have a few well-annotated data and confront a high data collection cost to build the NLG module. In this work, we adopt the self-training framework to deal with the few-shot MR-to-Text generation problem. We leverage the pre-trained language model to self-augment many pseudo-labeled data. To prevent the gradual drift from target data distribution to noisy augmented data distribution, we propose a novel data selection strategy to select the data that our generation model is most uncertain about. Compared with existing data selection methods, our method is: (1) parameter-efficient, which does not require training any additional neural models, (2) computation-efficient, which only needs to apply several stochastic forward passes of the model to estimate the uncertainty. We conduct empirical experiments on two benchmark datasets: FewShotWOZ and FewShotSGD, and show that our proposed framework consistently outperforms other baselines in terms of BLEU and ERR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2022

Curriculum-Based Self-Training Makes Better Few-Shot Learners for Data-to-Text Generation

Despite the success of text-to-text pre-trained models in various natura...
research
07/07/2021

On Training Instance Selection for Few-Shot Neural Text Generation

Large-scale pretrained language models have led to dramatic improvements...
research
06/10/2021

AUGNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation

Natural Language Generation (NLG) is a key component in a task-oriented ...
research
03/16/2022

In-Context Learning for Few-Shot Dialogue State Tracking

Collecting and annotating task-oriented dialogues is time-consuming and ...
research
10/16/2021

Improving Compositional Generalization with Self-Training for Data-to-Text Generation

Data-to-text generation focuses on generating fluent natural language re...
research
04/21/2019

Few-shot NLG with Pre-trained Language Model

Natural language generation (NLG) from structured data or knowledge is e...
research
06/19/2023

Unsupervised Open-domain Keyphrase Generation

In this work, we study the problem of unsupervised open-domain keyphrase...

Please sign up or login with your details

Forgot password? Click here to reset