SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation

05/15/2023
by   Junfeng Jiang, et al.
0

Dialogue segmentation is a crucial task for dialogue systems allowing a better understanding of conversational texts. Despite recent progress in unsupervised dialogue segmentation methods, their performances are limited by the lack of explicit supervised signals for training. Furthermore, the precise definition of segmentation points in conversations still remains as a challenging problem, increasing the difficulty of collecting manual annotations. In this paper, we provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues and release a large-scale supervised dataset called SuperDialseg, containing 9K dialogues based on two prevalent document-grounded dialogue corpora, and also inherit their useful dialogue-related annotations. Moreover, we propose two models to exploit the dialogue characteristics, achieving state-of-the-art performance on SuperDialseg and showing good generalization ability on the out-of-domain datasets. Additionally, we provide a benchmark including 20 models across four categories for the dialogue segmentation task with several proper evaluation metrics. Based on the analysis of the empirical studies, we also provide some insights for the task of dialogue segmentation. We believe our work is an important step forward in the field of dialogue segmentation.

READ FULL TEXT
research
05/30/2023

VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions

Video-grounded dialogue understanding is a challenging problem that requ...
research
11/12/2020

doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset

We introduce doc2dial, a new dataset of goal-oriented dialogues that are...
research
05/23/2023

MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems

Although automatic dialogue tutors hold great potential in making educat...
research
12/10/2021

Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity

State-of-the-art dialogue models still often stumble with regards to fac...
research
04/28/2017

Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models

Neural conversational models require substantial amounts of dialogue dat...
research
04/17/2020

A Survey of Document Grounded Dialogue Systems (DGDS)

Dialogue system (DS) attracts great attention from industry and academia...
research
09/12/2023

Leveraging Large Language Models for Automated Dialogue Analysis

Developing high-performing dialogue systems benefits from the automatic ...

Please sign up or login with your details

Forgot password? Click here to reset