DeepAI AI Chat
Log In Sign Up

Utterance Pair Scoring for Noisy Dialogue Data Filtering

04/29/2020
by   Reina Akama, et al.
Tohoku University
0

Filtering noisy training data is one of the key approaches to improving the quality of neural network-based language generation. The dialogue research community especially suffers from a lack of less-noisy and sufficiently large data. In this work, we propose a scoring function that is specifically designed to identify low-quality utterance–response pairs to filter noisy training data. Our scoring function models the naturalness of the interconnection within dialogue pairs and their content-relatedness, which is based on previous findings in dialogue response generation and linguistics. We then demonstrate the effectiveness of our scoring function by confirming (i) the correlation between automatic scoring by the proposed function and human evaluation, and (ii) the performance of a dialogue response generator trained with filtered data. Furthermore, we experimentally confirm that our scoring function potentially works as a language-independent method.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/23/2019

Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses

Automatically evaluating the quality of dialogue responses for unstructu...
06/12/2021

Improving Unsupervised Dialogue Topic Segmentation with Utterance-Pair Coherence Scoring

Dialogue topic segmentation is critical in several dialogue modeling pro...
02/18/2020

Improving Multi-Turn Response Selection Models with Complementary Last-Utterance Selection by Instance Weighting

Open-domain retrieval-based dialogue systems require a considerable amou...
05/21/2023

EM Pre-training for Multi-party Dialogue Response Generation

Dialogue response generation requires an agent to generate a response ac...
09/01/2021

ConRPG: Paraphrase Generation using Contexts as Regularizer

A long-standing issue with paraphrase generation is how to obtain reliab...
07/03/2022

Generating Repetitions with Appropriate Repeated Words

A repetition is a response that repeats words in the previous speaker's ...
09/14/2021

Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization

Being able to reply with a related, fluent, and informative response is ...