Dialog Inpainting: Turning Documents into Dialogs

05/18/2022
by   Zhuyun Dai, et al.
0

Many important questions (e.g. "How to eat healthier?") require conversation to establish context and explore in depth. However, conversational question answering (ConvQA) systems have long been stymied by scarce training data that is expensive to collect. To address this problem, we propose a new technique for synthetically generating diverse and high-quality dialog data: dialog inpainting. Our approach takes the text of any document and transforms it into a two-person dialog between the writer and an imagined reader: we treat sentences from the article as utterances spoken by the writer, and then use a dialog inpainter to predict what the imagined reader asked or said in between each of the writer's utterances. By applying this approach to passages from Wikipedia and the web, we produce WikiDialog and WebDialog, two datasets totalling 19 million diverse information-seeking dialogs – 1,000x larger than the largest existing ConvQA dataset. Furthermore, human raters judge the answer adequacy and conversationality of WikiDialog to be as good or better than existing manually-collected datasets. Using our inpainted data to pre-train ConvQA retrieval systems, we significantly advance state-of-the-art across three benchmarks (QReCC, OR-QuAC, TREC CAsT) yielding up to 40 on standard evaluation metrics.

READ FULL TEXT

page 4

page 14

page 15

page 16

research
05/14/2019

Improving Neural Conversational Models with Entropy-Based Data Filtering

Current neural-network based conversational models lack diversity and ge...
research
09/07/2019

Dependency Parsing for Spoken Dialog Systems

Dependency parsing of conversational input can play an important role in...
research
10/18/2018

Contextual Topic Modeling For Dialog Systems

Accurate prediction of conversation topics can be a valuable signal for ...
research
08/21/2018

QuAC : Question Answering in Context

We present QuAC, a dataset for Question Answering in Context that contai...
research
04/27/2023

q2d: Turning Questions into Dialogs to Teach Models How to Search

One of the exciting capabilities of recent language models for dialog is...
research
01/26/2021

Open-domain Topic Identification of Out-of-domain Utterances using Wikipedia

Users of spoken dialogue systems (SDS) expect high quality interactions ...
research
07/03/2022

DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech

The majority of current TTS datasets, which are collections of individua...

Please sign up or login with your details

Forgot password? Click here to reset