Evaluating Document Coherence Modelling

03/18/2021
by   Aili Shen, et al.
0

While pretrained language models ("LM") have driven impressive gains over morpho-syntactic and semantic tasks, their ability to model discourse and pragmatic phenomena is less clear. As a step towards a better understanding of their discourse modelling capabilities, we propose a sentence intrusion detection task. We examine the performance of a broad range of pretrained LMs on this detection task for English. Lacking a dataset for the task, we introduce INSteD, a novel intruder sentence detection dataset, containing 170,000+ documents constructed from English Wikipedia and CNN news articles. Our experiments show that pretrained LMs perform impressively in in-domain evaluation, but experience a substantial drop in the cross-domain setting, indicating limited generalisation capacity. Further results over a novel linguistic probe dataset show that there is substantial room for improvement, especially in the cross-domain setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2021

Discourse Probing of Pretrained Language Models

Existing work on probing of pretrained language models (LMs) has predomi...
research
07/16/2023

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

Modeling discourse – the linguistic phenomena that go beyond individual ...
research
04/14/2019

From News to Medical: Cross-domain Discourse Segmentation

The first step in discourse analysis involves dividing a text into segme...
research
01/29/2021

CD2CR: Co-reference Resolution Across Documents and Domains

Cross-document co-reference resolution (CDCR) is the task of identifying...
research
05/31/2023

How Does Pretraining Improve Discourse-Aware Translation?

Pretrained language models (PLMs) have produced substantial improvements...
research
12/03/2019

An Annotated Dataset of Coreference in English Literature

We present in this work a new dataset of coreference annotations for wor...
research
05/28/2019

A Cross-Domain Transferable Neural Coherence Model

Coherence is an important aspect of text quality and is crucial for ensu...

Please sign up or login with your details

Forgot password? Click here to reset