DeepAI AI Chat
Log In Sign Up

Utterance Pair Scoring for Noisy Dialogue Data Filtering

by   Reina Akama, et al.
Tohoku University

Filtering noisy training data is one of the key approaches to improving the quality of neural network-based language generation. The dialogue research community especially suffers from a lack of less-noisy and sufficiently large data. In this work, we propose a scoring function that is specifically designed to identify low-quality utterance–response pairs to filter noisy training data. Our scoring function models the naturalness of the interconnection within dialogue pairs and their content-relatedness, which is based on previous findings in dialogue response generation and linguistics. We then demonstrate the effectiveness of our scoring function by confirming (i) the correlation between automatic scoring by the proposed function and human evaluation, and (ii) the performance of a dialogue response generator trained with filtered data. Furthermore, we experimentally confirm that our scoring function potentially works as a language-independent method.


page 1

page 2

page 3

page 4


Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses

Automatically evaluating the quality of dialogue responses for unstructu...

Improving Unsupervised Dialogue Topic Segmentation with Utterance-Pair Coherence Scoring

Dialogue topic segmentation is critical in several dialogue modeling pro...

Improving Multi-Turn Response Selection Models with Complementary Last-Utterance Selection by Instance Weighting

Open-domain retrieval-based dialogue systems require a considerable amou...

EM Pre-training for Multi-party Dialogue Response Generation

Dialogue response generation requires an agent to generate a response ac...

ConRPG: Paraphrase Generation using Contexts as Regularizer

A long-standing issue with paraphrase generation is how to obtain reliab...

Generating Repetitions with Appropriate Repeated Words

A repetition is a response that repeats words in the previous speaker's ...

Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization

Being able to reply with a related, fluent, and informative response is ...