Molweni: A Challenge Multiparty Dialogues-based Machine Reading Comprehension Dataset with Discourse Structure

04/10/2020
by   Jiaqi Li, et al.
0

We present the Molweni dataset, a machine reading comprehension (MRC) dataset built over multiparty dialogues. Molweni's source samples from the Ubuntu Chat Corpus, including 10,000 dialogues comprising 88,303 utterances. We annotate 32,700 questions on this corpus, including both answerable and unanswerable questions. Molweni also uniquely contributes discourse dependency annotations for its multiparty dialogues, contributing large-scale (78,246 annotated discourse relations) data to bear on the task of multiparty dialogue understanding. Our experiments show that Molweni is a challenging dataset for current MRC models; BERT-wwm, a current, strong SQuAD 2.0 performer, achieves only 67.7 against its SQuAD 2.0 performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2021

DADgraph: A Discourse-aware Dialogue Graph Neural Network for Multiparty Dialogue Machine Reading Comprehension

Multiparty Dialogue Machine Reading Comprehension (MRC) differs from tra...
research
11/08/2019

An Annotation Scheme of A Large-scale Multi-party Dialogues Dataset for Discourse Parsing and Machine Comprehension

In this paper, we propose the scheme for annotating large-scale multi-pa...
research
08/28/2019

Discourse-Aware Semantic Self-Attention for Narrative Reading Comprehension

In this work, we propose to use linguistic annotations as a basis for a ...
research
12/01/2019

Machines Getting with the Program: Understanding Intent Arguments of Non-Canonical Directives

Modern dialog managers face the challenge of having to fulfill human-lev...
research
08/29/2019

Ellipsis and Coreference Resolution as Question Answering

Coreference and many forms of ellipsis are similar to reading comprehens...
research
12/30/2019

The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries

Understanding stories is a challenging reading comprehension problem for...
research
01/02/2023

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

Reading comprehension of legal text can be a particularly challenging ta...

Please sign up or login with your details

Forgot password? Click here to reset