An Empirical Study on the Overlapping Problem of Open-Domain Dialogue Datasets

01/17/2022
by   Yuqiao Wen, et al.
0

Open-domain dialogue systems aim to converse with humans through text, and its research has heavily relied on benchmark datasets. In this work, we first identify the overlapping problem in DailyDialog and OpenSubtitles, two popular open-domain dialogue benchmark datasets. Our systematic analysis then shows that such overlapping can be exploited to obtain fake state-of-the-art performance. Finally, we address this issue by cleaning these datasets and setting up a proper data processing procedure for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2019

Deep Retrieval-Based Dialogue Systems: A Short Review

Building dialogue systems that naturally converse with humans is being a...
research
05/10/2021

Recent Advances in Deep Learning-based Dialogue Systems

Dialogue systems are a popular Natural Language Processing (NLP) task as...
research
12/18/2022

Let's Negotiate! A Survey of Negotiation Dialogue Systems

Negotiation is one of the crucial abilities in human communication, and ...
research
10/03/2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition

In this work, we propose a novel segmental hypergraph representation to ...
research
03/10/2020

Large-Scale Evaluation of Keyphrase Extraction Models

Keyphrase extraction models are usually evaluated under different, not d...
research
04/17/2023

An Empirical Study of Multitask Learning to Improve Open Domain Dialogue Systems

Autoregressive models used to generate responses in open-domain dialogue...
research
06/06/2018

Open Domain Suggestion Mining: Problem Definition and Datasets

We propose a formal definition for the task of suggestion mining in the ...

Please sign up or login with your details

Forgot password? Click here to reset