Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

07/16/2023
by   Longyue Wang, et al.
0

Modeling discourse – the linguistic phenomena that go beyond individual sentences, is a fundamental yet challenging aspect of natural language processing (NLP). However, existing evaluation benchmarks primarily focus on the evaluation of inter-sentence properties and overlook critical discourse phenomena that cross sentences. To bridge the gap, we propose Disco-Bench, a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks, covering understanding, translation, and generation. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena (e.g. cohesion and coherence) in Chinese and/or English. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge. We totally evaluate 20 general-, in-domain and commercial models based on Transformer, advanced pretraining architectures and large language models (LLMs). Our results show (1) the challenge and necessity of our evaluation benchmark; (2) fine-grained pretraining based on literary document-level training data consistently improves the modeling of discourse information. We will release the datasets, pretrained models, and leaderboard, which we hope can significantly facilitate research in this field: https://github.com/longyuewangdcu/Disco-Bench.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2023

Document-Level Machine Translation with Large Language Models

Large language models (LLMs) such as Chat-GPT can produce coherent, cohe...
research
04/13/2021

Discourse Probing of Pretrained Language Models

Existing work on probing of pretrained language models (LMs) has predomi...
research
08/31/2019

Evaluation Benchmarks and Learning Criteriafor Discourse-Aware Sentence Representations

Prior work on pretrained sentence embeddings and benchmarks focus on the...
research
03/18/2021

Evaluating Document Coherence Modelling

While pretrained language models ("LM") have driven impressive gains ove...
research
05/06/2022

When a sentence does not introduce a discourse entity, Transformer-based models still sometimes refer to it

Understanding longer narratives or participating in conversations requir...
research
06/20/2016

The LAMBADA dataset: Word prediction requiring a broad discourse context

We introduce LAMBADA, a dataset to evaluate the capabilities of computat...
research
09/16/2022

ConFiguRe: Exploring Discourse-level Chinese Figures of Speech

Figures of speech, such as metaphor and irony, are ubiquitous in literat...

Please sign up or login with your details

Forgot password? Click here to reset