Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

07/07/2021
by   Philippe Laban, et al.
11

The Shuffle Test is the most common task to evaluate whether NLP models can measure coherence in text. Most recent work uses direct supervision on the task; we show that by simply finetuning a RoBERTa model, we can achieve a near perfect accuracy of 97.8 performance is unlikely to lead to a good model of text coherence, and suggest that the Shuffle Test should be approached in a Zero-Shot setting: models should be evaluated without being trained on the task itself. We evaluate common models in this setting, such as Generative and Bi-directional Transformers, and find that larger architectures achieve high-performance out-of-the-box. Finally, we suggest the k-Block Shuffle Test, a modification of the original by increasing the size of blocks shuffled. Even though human reader performance remains high (around 95 from 94 challenge to benchmark NLP models. Code available: https://github.com/tingofurro/shuffle_test/

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2022

Revisiting the Effects of Leakage on Dependency Parsing

Recent work by Søgaard (2020) showed that, treebank size aside, overlap ...
research
06/27/2023

CLIPA-v2: Scaling CLIP Training with 81.1 within a $10,000 Budget; An Extra $4,000 Unlocks 81.8

The recent work CLIPA presents an inverse scaling law for CLIP training ...
research
05/21/2023

Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers

This paper explores the effectiveness of model-generated signals in impr...
research
10/15/2021

Boosting coherence of language models

Naturality of long-term information structure – coherence – remains a ch...
research
07/31/2022

Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval

Pre-trained transformers has declared its success in many NLP tasks. One...
research
03/17/2022

Coherence boosting: When your pretrained language model is not paying enough attention

Long-range semantic coherence remains a challenge in automatic language ...
research
06/03/2019

Gendered Ambiguous Pronouns Shared Task: Boosting Model Confidence by Evidence Pooling

This paper presents a strong set of results for resolving gendered ambig...

Please sign up or login with your details

Forgot password? Click here to reset