Dialogue Distillation: Open-domain Dialogue Augmentation Using Unpaired Data

09/20/2020
by   Rongsheng Zhang, et al.
0

Recent advances in open-domain dialogue systems rely on the success of neural models that are trained on large-scale data. However, collecting large-scale dialogue data is usually time-consuming and labor-intensive. To address this data dilemma, we propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data. Specifically, a data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data. A ranking module is employed to filter out low-quality dialogues. Further, a model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs, thereby preventing dialogue models from being affected by the noise in the augmented data. Automatic and manual evaluation indicates that our method can produce high-quality dialogue pairs with diverse contents, and the proposed data-level and model-level dialogue distillation can improve the performance of competitive baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2022

Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues

The construction of open-domain dialogue systems requires high-quality d...
research
09/28/2020

Pchatbot: A Large-Scale Dataset for Personalized Chatbot

Natural language dialogue systems raise great attention recently. As man...
research
09/18/2023

Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation

NSFW (Not Safe for Work) content, in the context of a dialogue, can have...
research
07/28/2022

Persona-Knowledge Dialogue Multi-Context Retrieval and Enhanced Decoding Methods

Persona and Knowledge dual context open-domain chat is a novel dialogue ...
research
09/14/2021

Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization

Being able to reply with a related, fluent, and informative response is ...
research
10/29/2020

Conversation Graph: Data Augmentation, Training and Evaluation for Non-Deterministic Dialogue Management

Task-oriented dialogue systems typically rely on large amounts of high-q...
research
04/29/2020

Utterance Pair Scoring for Noisy Dialogue Data Filtering

Filtering noisy training data is one of the key approaches to improving ...

Please sign up or login with your details

Forgot password? Click here to reset