Utterance Pair Scoring for Noisy Dialogue Data Filtering

04/29/2020
by   Reina Akama, et al.
0

Filtering noisy training data is one of the key approaches to improving the quality of neural network-based language generation. The dialogue research community especially suffers from a lack of less-noisy and sufficiently large data. In this work, we propose a scoring function that is specifically designed to identify low-quality utterance–response pairs to filter noisy training data. Our scoring function models the naturalness of the interconnection within dialogue pairs and their content-relatedness, which is based on previous findings in dialogue response generation and linguistics. We then demonstrate the effectiveness of our scoring function by confirming (i) the correlation between automatic scoring by the proposed function and human evaluation, and (ii) the performance of a dialogue response generator trained with filtered data. Furthermore, we experimentally confirm that our scoring function potentially works as a language-independent method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2019

Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses

Automatically evaluating the quality of dialogue responses for unstructu...
research
06/12/2021

Improving Unsupervised Dialogue Topic Segmentation with Utterance-Pair Coherence Scoring

Dialogue topic segmentation is critical in several dialogue modeling pro...
research
02/18/2020

Improving Multi-Turn Response Selection Models with Complementary Last-Utterance Selection by Instance Weighting

Open-domain retrieval-based dialogue systems require a considerable amou...
research
05/21/2023

EM Pre-training for Multi-party Dialogue Response Generation

Dialogue response generation requires an agent to generate a response ac...
research
09/20/2020

Dialogue Distillation: Open-domain Dialogue Augmentation Using Unpaired Data

Recent advances in open-domain dialogue systems rely on the success of n...
research
07/03/2022

Generating Repetitions with Appropriate Repeated Words

A repetition is a response that repeats words in the previous speaker's ...
research
09/14/2021

Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization

Being able to reply with a related, fluent, and informative response is ...

Please sign up or login with your details

Forgot password? Click here to reset